I spend a lot of time on client sites and time and again, one of the mistakes that I see people making, is trying to start with large projects. I think one of my all time favorite quotes about IT is:
Any successful large IT system used to be a successful small IT system.
The next time you're thinking about creating a project that's going to have a big bang outcome, please remember this. The history of the industry is that it really is likely to be a big bang, and not in terms of being a big success like you'd hoped for.
A staggeringly high percentage of large IT projects fail.
This is even more important in business intelligence projects. The industry is littered with companies that have spent a fortune on BI projects, and almost no-one in those companies can, or do, use the outcomes. It's a sad but true statistic.
Asking users what they want, gathering a large amount of detail, then building it, testing it, and delivering it to them sounds good, but it almost never works. Unfortunately, it's how many larger organizations think projects need to be managed. They assume that creating an IT project is like building a bridge.
The first problem is that the users don't know what they want. Don't blame them later because you think they didn't give you the right instructions. That's your fault for assuming they can describe exactly what they want. In many cases, until they see some of the outcomes, they won't start to understand what the system will do, and until then, they won't realize what they really need.
Second, no matter how clearly you document their requirements, it won't be what they need. One of my Kiwi buddies Dave Dustin was having a bad day once, and I remember he mentioned that he was going to just spend the day delivering exactly what people asked for. That was wonderful and beautifully insightful, because we all know that it would lead to a disaster. It's little surprise. They might have said "sales" but they really meant "profit", and so on.
Finally, the larger the project, the longer it will take to deliver, and by then, even if you'd done it perfectly, the users' needs will have changed, so it still won't be what they want.
When you're starting a BI project, I'd implore you to find a project that has these characteristics:
Small enough to be done in a couple of weeks at most
Based on mocked up data in front of the target users
Large enough to make a significant difference to someone's life or work
Targeted at someone who's important enough to be a champion for the rest of what you want to achieve
(Ideally) targeted at someone "who pays the bills"
Start by doing a great job, and building on that, once you have support.
I don't know if it's considered some sort of modern trend, but what is it with applications now that just swallow errors instead of dealing with them? Is there an edict within these companies that errors should get shown, so they can argue their app doesn't have errors?
I'm working with a SaaS app right now. It does editing. Sometimes when I save, it just doesn't save. No error, just nothing saved. Or every now and then, I find the order of what I've entered just gets changed. Again, no error, but the order was changed.
Worse, sometimes when I then try to correct the order, it shows it as done, but next time I go back to that screen, the order is back the way it was in the first place.
On many occasions, if I close my browser, open it again, and log in, it all works OK again for a while.
But it's not just these types of applications. I've lost count of the number of sites I've been to, where supposedly serious applications are being developed, yet the code is full of try/catch blocks but the catch blocks are empty ie: silently ignoring any errors that occur.
How did we get to the point that this is what passes for application development now? Apps that mostly work and fail silently?
System configurations are becoming more complex all the time. Each and every server, container, and key application and service today has log files that tell you a wealth about what's going on under the covers. But how accessible are those log files in your organization?
If you aren't using a log management tool, you probably should be.
Here are a few easy ones to get started with:
One thing that I do find frustrating with Microsoft tooling at present is the constant churn of product names. A while back, we had Application Insights that could collect details of what was happening within an app. The data for that was stored in a tool called Log Analytics, and it could also collect operating system logs and more. Agents were provided for on-premises systems.
Originally, these tools had different query languages but eventually, the query language for Log Analytics was one that's used. It's awesome to be able to write a query to simply find and filter log details.
For my SQL Server buddies, there were SQL Insights which has now morphed into SQL Server Intelligent Insights along with Azure SQL Database Intelligent Insights. These allow you to capture bunches of info about your SQL Server instances and databases so very simply.
I constantly visit client sites where they have purchased tools for this, and those tools aren't even close to being as useful as these Azure ones. And they don't just work with Microsoft servers and services.
Anyway, these have now all been bundled up again under the name Azure Monitor.
Azure Monitor also offers built-in integration with popular DevOps, issue management, ITSM and SIEM tools. You can use packaged solutions for monitoring specialised workloads, or build your own custom integration using Azure Monitor REST APIs and webhooks.
Another interesting offering from our friends at SolarWinds, is Papertrail. Their claim is "Frustration-free log management. Get started in seconds.
Instantly manage logs from 2 servers… or 2,000". Papertrail seems to be gaining a stronghold in the Linux, MySQL, Ruby, Apache, Tomcat areas along with many others.
In the end, if you aren't using one of these types of tools, you probably should be.
I used to do a lot of work at the operating system and network level. I was always fascinated watching people use network trace tools when they were trying to debug a problem. The challenge was that they had no idea what was normal activity on the network, and what wasn't.
The end result of this is that they'd then spend huge amounts of time chasing down what were really just red herrings.
When you don't know what normal activity looks like, everything looks odd.
Today, I see the same thing with traces of SQL Server activity, either using SQL Profiler (and/or SQL Trace), and Extended Events Profiler. I also see the same thing with insights data sent to Log Analytics, and the outcomes of many expensive SQL Server monitoring tools.
For example, if you are looking at a SQL Server trace, and you see a large number of sp_reset_connection commands. Is that an issue? When would it be an issue, and when is it just normal?
If I see an sp_reset_connection executed on a connection followed by a number of other commands, I know that the application is using connection pooling. If however, I see a bunch of those on the same connection, without any commands executed in between, I know that the application code is opening connections when it doesn't need to. Perhaps it should be opening the connection closer to where it decides if it needs it.
The key point is that it's really important that you learn to use these tools before you have a problem. You need to be able to recognize what's normal, and what isn't.
I spend most of my consulting/mentoring time in larger organizations, many are large financial organizations. In every one now, there is a person heading up a "Cloud Transformation" project, but none of these companies mean the same thing when they talk about these types of projects.
In so many companies, all they are doing is taking their virtual machines and networks that are hosted in some existing hosting provider, and moving them into a public cloud.
Let's be clear: those companies are not making a cloud transformation.
They might be replacing existing infrastructure that's difficult to work with, with dynamic and configurable cloud infrastructure. They might also be outsourcing many of the functions of their difficult-to-work-with IT support teams with the public cloud ones. This can particularly apply to many offshore or offsite IT management providers.
But there's no real transformation going on. In so many cases, they are missing out on the real beauty that can be offered by cloud-based services.
Companies like Microsoft started pushing PaaS services heavily at first, but then realized that many companies just aren't agile enough to be able to start to use them. They now seem to focus on getting the customers into the cloud as the first step (aka lift and shift), then later focusing on getting them to use it properly.
If you aren't putting PaaS services in place, I don't think you're really making a cloud transformation. You may just be upgrading to the Expensive Hosting 2.0 that Damon mentioned.
That may be better than where you were but you shouldn't confuse this with a real transformation.
In a recent post, I mentioned that I often need to do a quick check to see if the schema of two SQL Server databases is the same, and how our GetDBSchemaCoreComparison procedure can make that easy. On a similar vein, I often need to get a detailed comparison of two tables.
In our free SDU Tools for developers and DBAs, there is another stored procedure called GetTableSchemaComparison to make that easy as well. It takes the following parameters and also returns a rowset that's easy to consume programmatically (or by just looking at it):
@Table1DatabaseName sysname -> name of the database containing the first table @Table1SchemaName sysname -> schema name for the first table @Table1TableName sysname -> table name for the first table @Table2DatabaseName sysname -> name of the database containing the second table @Table2SchemaName sysname -> schema name for the second table @Table2TableName sysname -> table name for the second table @IgnoreColumnID bit -> set to 1 if tables with the same columns but in different order are considered equivalent, otherwise set to 0 @IgnoreFillFactor bit -> set to 1 if index fillfactors are to be ignored, otherwise set to 0
Note that it doesn't care if the two tables have different names in the same or two different databases.
You can see the outcome in the main image above.
You can see it in action here:
To become an SDU Insider and to get our free tools and eBooks, please just visit here:
I often need to do a quick check to see if the schema of two SQL Server databases is the same.
In our free SDU Tools for developers and DBAs, there is a stored procedure called GetDBSchemaCoreComparison to make that easy. It takes the following parameters and returns a rowset that's easy to consume programmatically (or by just looking at it):
@Database1 sysname -> name of the first database to check @Database2 sysname -> name of the second database to compare @IgnoreColumnID bit -> set to 1 if tables with the same columns but in different order are considered equivalent, otherwise set to 0 @IgnoreFillFactor bit -> set to 1 if index fillfactors are to be ignored, otherwise set to 0
You can see the outcome in the main image above.
You can see it in action here:
To become an SDU Insider and to get our free tools and eBooks, please just visit here:
Just a short post today to call out something that I'm seeing again and again. It's where organizations purchase all their hardware and software platforms before they start to carry out a proof of concept. This is a very poor option.
I was reading the data strategy for a global company that I was doing consulting work for. They were proudly introducing the new strategy yet I was sitting looking at it, trying to work out what they were thinking. The first step of their plan was to buy everything they needed. The second step was to carry out a proof of concept to see how it would all work (presuming it would work suitably at all).
This is ridiculous.
In that case, I think what's happening is that the IT management wants to seem proactive, buying hardware and software platforms is what they are experienced at, and they want to look like they are "doing something".
Yet, invariably, this locks them into decisions that aren't in the best interests of the organization. Instead of making sensible decisions, they end up making decisions, based on what they have already committed to. And the more expensive that purchase was, the more they will try for years to justify the expenditure decision that they made. Every choice will later be taken, based upon how well it fits with their existing purchase.
Don't do this.
Do everything you can to carry out the proof of concept without buying anything that locks you into a decision path.
Most companies have some sort of ongoing maintenance processes that perform periodic backups of databases. They also have log backups scheduled for any databases that are online and in full recovery model. However, when you first create a database, it might be in full recovery model yet never have had a full backup performed. If your scheduled jobs then try to create a log backup prior to the first full backup, the backup will fail, and that might make your job fail.
One option to avoid this is to always create a backup of a database immediately, as part of the database creation script. You aren't needing the backup, you just want to avoid the next log backup failing if that happens before the next full backup.
The easiest way to do that is a backup to nowhere ie: the NUL device. You can do that via a command like this:
BACKUP DATABASE [MyNewDB] TO DISK = 'NUL';
It might be worth adding that to your database creation scripts for full recovery databases, to avoid throwing errors in your log backup jobs.
I spend a lot of time working with software houses (Microsoft calls these ISVs – Independent Software Vendors). More and more, these companies are looking to convert their on-premises applications to cloud-based SaaS (Software as a Service) offerings.
For many of the ISVs, their on-premises applications are single-tenant ie: they are designed to support a single organization. When they are looking to cloud offerings, one of the first decisions is whether they should create a single database to hold the data for all their client organizations, or whether they should create a separate database for each client organization.
As with most things in computing, there is no one simple answer to this.
Here are the main decision points that I look at:
For me, this is the #1 item. You have to decide how important isolating one client's data from other clients is. Time and again, I hear how "the app does that" but you need to keep in mind that in most multi-tenant models, you are only one faulty WHERE clause away from showing one client, another client's data. In fact, it's usually far harder to get the code correct in multi-tenant databases.
So I think you need to start by considering what the outcome of that would be. For some ISVs, this would be deeply embarrassing but manageable. For other ISVs, this would simply be terminal for the organization.
Imagine the discussion between your CEO and the client whose data was disclosed. How would that discussion go? Would you still have a job? Would you still have an organization?
If you have even the slightest doubt about this, you should lean towards single-tenant ie: a separate database per client. You still have the chance to mess that up, but you are starting in a better place.
This one is pretty easy. In every current cloud provider, single larger databases are more economical than large numbers of smaller databases. I don't think that cost should be your primary concern for this, but if it is, you will lean towards single-database designs.
If you are working in Azure SQL Database though, and haven't looked at their elastic database pools, you should consider them before making your decision.
A key aspect of "cloudiness" is the granularity of providing resources as required, just when they are required. Having separate databases for each client is much more "cloudy". A simple example of this is that each of your clients might require a different level of performance and/or features.
One client might want to run a very low cost test, another might need good solid general performance, another might need the best performance available. If you use separate databases, even clients running the same application could use databases with different scale and/or performance.
You might also be able to add features to specific clients. For example, one client might want a read-only copy of his/her data in another location.
Having separate databases lets you decide these things on a client-by-client basis.
One other option to consider here is that you might have different requirements even for a single client organization. They might have different environments (ie: Production, UAT, Test, etc.) that require different capabilities.
Everyone who's used a single database to support a large number of clients has run into the "noisy neighbor" situation at some time.
One client gets to the point that they can't get their work done because of the overhead being placed on the single database by another tenant.
If you have a situation where the load placed by different tenants varies, you are going to find life much easier if you have separate databases.
If you use a multi-tenant design, you will need to consider how to move one tenant to another database if that's required. (I've been in ISVs where this is only considered when someone starts complaining but it needs to be part of the design from day #1).
When you have a multi-tenant database, almost every table will have a tenant ID of some type, and these columns will be involved in almost every join operation.
It's not hard to imagine that these databases simply perform worse. Note that I'm not talking about single vs multi-database on a single SQL Server. In those situations, there can also be positive performance outcomes from a single database design – but that's a topic for another day.
If you have one client that needs to roll back their data to an earlier time, this is often extremely difficult with single-database designs. It's trivial to achieve with multi-database designs.
The same applies to situations where clients have different retention policies, or where there is a need for a client to be able to export their own data (or import it again).
You need to consider who will own the data that's used by your application. When you use a single-database model with multiple tenants, it's clearly going to be a database that you own and are responsible for.
Is that an appropriate risk for your organization?
One model that I'm seeing more commonplace now is that while the ISV owns and operates the application, the end customers own their own databases (and pay for them). This might simplify your legal situation in regards to data ownership. It might also help if there are data sovereignty issues.
Take legal advice on this.
As I mentioned, this isn't a simple decision. Nowadays for cloud-based PaaS (Platform as a Service) databases to support SaaS applications though, unless there is a compelling reason not to, I'd suggest starting with a separate database for each client every time.