Opinion: SQL Server Databases in the Cloud – Single Tenant or Multi Tenant?

I spend a lot of time working with software houses (Microsoft calls these ISVs – Independent Software Vendors). More and more, these companies are looking to convert their on-premises applications to cloud-based SaaS (Software as a Service) offerings.

For many of the ISVs, their on-premises applications are single-tenant ie: they are designed to support a single organization. When they are looking to cloud offerings, one of the first decisions is whether they should create a single database to hold the data for all their client organizations, or whether they should create a separate database for each client organization.

As with most things in computing, there is no one simple answer to this.

Here are the main decision points that I look at:

Isolation

For me, this is the #1 item. You have to decide how important isolating one client's data from other clients is. Time and again, I hear how "the app does that" but you need to keep in mind that in most multi-tenant models, you are only one faulty WHERE clause away from showing one client, another client's data. In fact, it's usually far harder to get the code correct in multi-tenant databases.

So I think you need to start by considering what the outcome of that would be. For some ISVs, this would be deeply embarrassing but manageable. For other ISVs, this would simply be terminal for the organization.

Imagine the discussion between your CEO and the client whose data was disclosed. How would that discussion go? Would you still have a job? Would you still have an organization?

Image by Dmitry Ratushny
Image by Dmitry Ratushny

If you have even the slightest doubt about this, you should lean towards single-tenant ie: a separate database per client. You still have the chance to mess that up, but you are starting in a better place.

Price

This one is pretty easy. In every current cloud provider, single larger databases are more economical than large numbers of smaller databases. I don't think that cost should be your primary concern for this, but if it is, you will lean towards single-database designs.

If you are working in Azure SQL Database though, and haven't looked at their elastic database pools, you should consider them before making your decision.

Cloudiness

A key aspect of "cloudiness" is the granularity of providing resources as required, just when they are required. Having separate databases for each client is much more "cloudy". A simple example of this is that each of your clients might require a different level of performance and/or features.

One client might want to run a very low cost test, another might need good solid general performance, another might need the best performance available. If you use separate databases, even clients running the same application could use databases with different scale and/or performance.

You might also be able to add features to specific clients. For example, one client might want a read-only copy of his/her data in another location.

Having separate databases lets you decide these things on a client-by-client basis.

One other option to consider here is that you might have different requirements even for a single client organization. They might have different environments (ie: Production, UAT, Test, etc.) that require different capabilities.

Noisy Neighbors

Everyone who's used a single database to support a large number of clients has run into the "noisy neighbor" situation at some time.

Image by Nik Shuliahin
Image by Nik Shuliahin

One client gets to the point that they can't get their work done because of the overhead being placed on the single database by another tenant.

If you have a situation where the load placed by different tenants varies, you are going to find life much easier if you have separate databases.

If you use a multi-tenant design, you will need to consider how to move one tenant to another database if that's required. (I've been in ISVs where this is only considered when someone starts complaining but it needs to be part of the design from day #1).

Query Performance

When you have a multi-tenant database, almost every table will have a tenant ID of some type, and these columns will be involved in almost every join operation.

It's not hard to imagine that these databases simply perform worse. Note that I'm not talking about single vs multi-database on a single SQL Server. In those situations, there can also be positive performance outcomes from a single database design – but that's  a topic for another day.

Recovery/Rollback/Import/Export/Retention

If you have one client that needs to roll back their data to an earlier time, this is often extremely difficult with single-database designs. It's trivial to achieve with multi-database designs.

The same applies to situations where clients have different retention policies, or where there is a need for a client to be able to export their own data (or import it again).

Data Ownership

You need to consider who will own the data that's used by your application. When you use a single-database model with multiple tenants, it's clearly going to be a database that you own and are responsible for.

Is that an appropriate risk for your organization?

One model that I'm seeing more commonplace now is that while the ISV owns and operates the application, the end customers own their own databases (and pay for them). This might simplify your legal situation in regards to data ownership.  It might also help if there are data sovereignty issues.

Take legal advice on this.

Summary

As I mentioned, this isn't a simple decision. Nowadays for cloud-based PaaS (Platform as a Service) databases to support SaaS applications though, unless there is a compelling reason not to, I'd suggest starting with a separate database for each client every time.

SQL: Finding rows that have changed in T-SQL – CHECKSUM, BINARY_CHECKSUM, HASHBYTES

If you have data in a SQL Server table and you want to know if any of the values in a row have changed, the best way to do that is by using the rowversion data type. (Note: this used to be called the timestamp data type in a rather unfortunate naming choice). I'll talk more about it in another post.

But today I wanted to discuss the another issue. If I have an incoming row of data (let's say @Parameter1, @Parameter2, @Parameter3, @Parameter4) and I want to know if the incoming values match the ones already in the table and update the table only if they are different, what's the best way to do that/

You might think that's easy and you'd just add a WHERE clause to your UPDATE like:

If you have a large number of columns, doing that gets old pretty fast. Worse, if the columns are nullable, it really needs to be more like this:

You can imagine what this looks like if there are a large number of columns, and you can imagine the amount of calculation that could be needed, just to see if one value has changed.

An alternative approach is to add one more column that represents a checksum or hash value for all the columns, and to just compare that.

The first challenge is to get a single value. Fortunately, the CONCAT function provided in SQL Server 2012 and later works a treat. It promotes all values passed to it to strings, ignores NULL values, and outputs a single string. This means we could calculate the hash or checksum like this:

and we could then compare that to the precomputed value we stored when we last inserted or updated the row. (Either directly inserting the value or via a persisted computed column)

It might be worth using an alternate separator if there's any chance it could occur at the end of any value.

If you are using SQL Server 2017 or later, you'd hope that you could use the CONCAT_WS (concatenate with separator) function instead but unfortunately, it ignores NULL values, so you'd have to wrap them all with ISNULL or COALESCE.

It would have been great if that function had a way to not ignore NULL values, or to have another function provided. It would be really useful if you wanted to generate output lines for CSV or TSV files.

The question that remains is about which hashing function to use.

I often see customers trying to use the CHECKSUM function for this. The problem with CHECKSUM is that it returns an int and you have a reasonable chance of getting a collision. That would be bad, as you'd assume that data was the same when it wasn't. A while back, we got the documentation people to write specific notes about this. You'll notice the web page now says:

"If at least one of the values in the expression list changes, the list checksum will probably change. However, this is not guaranteed."

I'm assuming you want something better than that.

BINARY_CHECKSUM sounds better but it's really more like a case-sensitive version of CHECKSUM, and unless you're working with case-sensitive servers, that's even worse.

The real answer is a true hashing function, and in SQL Server that means HASHBYTES. It's computationally more intensive but is suitable for this type of change detection. The hassle with it was that it was limited to 8000 bytes. Fortunately, in SQL Server 2016 and later, that limitation was removed, so it's now the one to use:

The other point made by our buddy Ron Dunn in the comments is extremely valuable. If you are using CONCAT to combine numbers and date/time values, it's critical that you always format them as strings yourself to control the format used. That way, you won't fall foul of different regional or language settings.

 

 

Shortcut: The magical F1 key – help on syntax and metadata

I used to always recommend that people install Books Online (BOL) on their systems. It's ironic that it was called "Online", given we're really talking about "Offline" nowadays, but back when we first were talking about it, we were comparing it to a physical book, not to a live reference on the Internet.

Nowadays though, I find that the version online is so far superior to the one that you can install locally, that I think it's better to just use the online version. I particularly like the way that the online books are now cross-version ie: each page covers all supported versions, instead of having a separate page for each version.

Given there is a lot of information online, what's the quickest way then to find the page that you're after?

For T-SQL commands, I find the best option is to type the command name followed by tsql into the search bar. For example, here's a search for the SET command:

Invariably, the command that you're after will be either the first or second entry.

But there's an even quicker way when you're editing in SQL Server Management Studio (SSMS). In this script that I'm editing:

If I double-click EXECUTE to highlight it, then hit F1, I'm taken directly to the correct page.

That's awesome but the F1 key can do more. If I was looking at or editing this script from WideWorldImporters:

and I'm wondering about the People table, I can highlight the full table name [Application].People, then hit Alt-F1.I'm then returned all sorts of information about the table:

What SSMS is doing is running the sp_help command for that object. In another post, we'll talk about how you could change that if needed, or change what happens with other function keys.

SDU Tools: Create a linked server to Azure SQL Database from SQL Server

When I need to move small amounts of data between an on-premises SQL Server system and an Azure SQL Database, or run queries that involve both systems, the easiest option at present is to create a linked server to Azure SQL Database.

And the easiest way to do that (or if you can't remember the commands) is with one of our free SDU Tools for developers and DBAs, called CreateLinkedServerToAzureSQLDatabase. You can read it to check out the code, or use it like this:

The parameters are as follows:

@LinkedServerName sysname – name that will be assigned to the linked server
– defaults to AzureSQLDB
@AzureSQLServerName nvarchar(max) – name for the Azure SQL Server eg: myserver.database.windows.net
@AzureSQLServerTCPPort int – port number for the Azure SQL Server (defaults to 1433)
@AzureSQLDatabaseName sysname – name of the database (defaults to master)
@RemoteLoginName sysname – login name for the Azure database
@RemotePassword nvarchar(max) – password for the Azure database
@SetCollationCompatible bit – is the remote server collation compatible (default is true)
@SetRPCIn bit – should rpc (remote procedure calls = stored procedure calls) be allowed (default is true)
@SetRPCOut bit – should rpc output be allowed (default is true)

Once you've done that, you will see it in the list of Linked Servers as in the main image above. In that image, I've created one called AzurePublicData. It contains a database called PublicData, and the database contains a number of tables.

You can then query the tables using four part names:

You can see it in action here:

To become an SDU Insider and to get our free tools and eBooks, please just visit here:

http://sdutools.sqldownunder.com

Opinion: Learning to talk to developers – marketing to the right people for SQL Server

Yesterday, I posted about why software houses don't keep up to date with SQL Server releases. That's one issue but there's also a deeper issue. When I saw postings from product group members about this, it was predominantly DBAs that they are talking to.

I think this is targeting entirely the wrong people for this discussion.

For many years now, almost all the discussion from the product group to "customers" seems to have focused on DBAs. These people are often at the end of the chain in many organizations.

Yes these people might be involved in purchasing SQL Server licenses but by the time they see applications, they are often locked into technology stacks.

I'm not sure why it has happened but the product group seems to have lost the ability to talk to the developers, developer leads, and architects who are making these decisions. Almost every piece of marketing seems targeted at the end of the chain.

There is far more importance being placed on things like high availability features than on features that help developers create better applications in the first place, yet without a constant stream of new applications being built for SQL Server, it won't matter how good the high availability story is.

This often leads to bizarre outcomes. I always thought that Service Broker was a perfect example of this. Microsoft delivered an awesome platform for this that should have been a great starting point. But there were no tools, and no prescriptive guidance was provided. Then when it's not used much, the team seems to lose interest and move onto the next shiny feature.

The #1 problem with this though, was that the marketing was targeted at DBAs, most of whom at the time wouldn't have any interest in message-based architectures. The same applies to spatial, full-text, XML, in-memory OLTP, and so many more.

Most still don't get why these matter or much about why you should want to use them, but when you show these same features to a development team lead, their eyes light up.

Another current example is Azure SQL Database. Again, all the marketing seems targeted at DBAs working for existing customers, most of whom are currently using SQL Server on-premises. I have not the slightest doubt that the marketing team knows that's where their main initial income will come from, but when you are targeting these people, the entire discussion will be about what's missing. Existing applications are always going to be the hardest to migrate.

That's why Managed Instances are now seen as the great savior. But that just highlights the real issue.

Instead, I'd love to see the team focusing on developers, developer leads, and architects building greenfield applications. The team needs to learn to speak to these people again and focus on what is there, not what's missing.

They need to know that Azure SQL Database is an amazing platform to build their new applications on.

 

SQL: Why don't software vendors support the latest versions of SQL Server?

There has been quite a bit of discussion online lately about which version of SQL Server new applications should target. Members of the SQL Server product group were saying they can't see any reason why new applications should use anything less than SQL Server 2016 as a real base line.

I'd love to see any new application using the absolute latest version of SQL Server. Unlike the bad old days where you needed to wait for a service pack before using the product, the best tested version of SQL Server is invariably the one that's just released. (And there aren't service packs any more anyway).

However, I spend a lot of time in software houses and I know only too well why they are still working with older versions. There are two aspects to this:

  • Which version that they'll use features from
  • Which version they'll support customers using

The first one is easy. They only want to write code once, and so even though they might be using a later version, they'll keep their code to the earliest version that they support ie: even if an ISV (Independent Software Vendor) is supporting SQL Server 2017, they might have a restriction on using any features later than SQL Server 2012 because they want to be able to deploy on any version from 2012 to 2017.

The second one is tough. After they answer the first question, many then get really slack and don't put the effort in to test and support later versions. I understand that testing is costly and the more versions that you support, the worse this gets.

This is the true pain point though, as unless they do that, their customers will be stuck on those old versions. The problem is multiplied when the customer has to support applications from many vendors concurrently.

There is no way to fix this except pushing the application vendors to test later versions. As well as direct customer pressure, I'd love to see Microsoft reaching out about this more.

 

Shortcut: Apply cut or copy commands to blank lines when there is no selection

When I'm doing a lot of query editing, I often get a bit mesmerized, particularly if there's a lot of manual copy and paste or cut and paste going on.

One thing that often drives me crazy is when I use Ctrl-C (ie: copy) when I meant to use Ctrl-V (ie: paste). Invariably, I do this when I have nothing highlighted at all. So not only did I not get the value pasted, I just copied an empty value into the clipboard.

But SQL Server Management Studio (SSMS) has your back on this. (And so does Visual Studio)

In Tools, then Options, then Text Editor, then Transact-SQL, under the General tab, there's an option for Apply Cut of Copy commands to blank lines when there is no selection.

The default is that it works as expected (like you don't want it to), but if you uncheck it, you might have just saved yourself some annoyance.

If I'm working with this code:

and I highlight the word DECLARE and hit Ctrl-C, then move to the blank line and hit Ctrl-C instead of Ctrl-V, I'd have just lost my first clipboard item. With this option unchecked, I can just smile, and hit Ctrl-V again, and it will still paste:

SDU Tools: Clear Service Broker Transmission Queue in T-SQL

Service Broker is one of my favorite tools in SQL Server. So many applications need to have a transactional queue, and many people try to build them using tables and other objects. But SQL Server has Service Broker and with it, you get the beauty of using the knowledge of someone who already knows about queues having thought about how they need to work.

But while Service Broker is quite forgiving, it's common while developing Service Broker applications to make mistakes and end up with messages in queues that will never be delivered because you had some bug in your code. Service Broker doesn't want to ever just throw your data away.

One of our free SDU Tools for T-SQL developers and DBAs is ClearServiceBrokerTransmission queue. Let's look at an example of using it.

We'll create a new database called Development and use it:

Then create a master encryption key, a queue, and a service:

Now let's send a message that will never be delivered (because the target service doesn't exist):

We can query to see that it's still in the transmission queue:

If we scroll to the right, we can see why it's stuck:

Now we realize that we messed up the service name and it will never be delivered, so we want to flush the queue. We can do that easily:

Note that if you are following along with the code and don't have SDU_Tools installed in that database, you'll need to change the EXEC line to point to a database where they are installed.

And finally, we could just clean up:

You can see this tool in action here:

To become an SDU Insider and to get our free tools and eBooks, please just visit here:

http://sdutools.sqldownunder.com

 

 

 

SQL: And where did the SSRS Private Assemblies folder move to in VS2017?

Recently, I posted about the templates folder for SQL Server Reporting Services designer moving in Visual Studio 2017. You'll find that blog post here.

I've also now found that the Private Assemblies folder has moved too. I do wish these locations would be fixed, shared, and known rather than needing to play hide and seek when updates are made to the SSRS designer tooling.

The Private Assemblies folder was used to hold a copy of any .NET assemblies that are referenced in your SSRS projects. By using .NET assemblies, you can share logic that's used in reports. It's worth noting that wherever we can, we try to avoid using .NET assemblies for this, and wish that SSRS had a way of dealing with common code inclusions, but that's a topic for another day.

Generally we only use code in reports for things like formatting functions, because we think that reports should render the data, not really calculate the data. (That should be done by a procedure or function on the server).

One of the reasons that we avoid using .NET assemblies for shared logic in reports is that when you do that, you buy yourself deployment issues. Where will the assembly (DLL) live? You basically have two choices:

Install it in the GAC (global assembly cache) – we don't like this one as it pollutes the GAC with application-specific logic.

Install it in both the Report Server's bin directory, and also in the private assemblies area for the SSRS designer. It has to go in two places as the designer needs to be able to find it during design time, and the report server needs to be able to find it at run time.

What has recently moved though, is where the private assemblies folder is located. Previously it was here:

\Program Files (x86)\Microsoft Visual Studio 14.0
\Common7\IDE\PrivateAssemblies\ProjectItems\ReportProject

Now it's either here:

\Program Files (x86)\Microsoft Visual Studio\2017
\SQL\Common7\IDE\CommonExtensions\Microsoft\SSRS\

Or here:

\Program Files (x86)\Microsoft Visual Studio\2017
\Enterprise\Common7\IDE\CommonExtensions\Microsoft\SSRS\

Like last time, I presume the difference is because of how the Visual Studio shell has been installed ie: either by SSDT (the SQL one), or by Visual Studio (the Enterprise one).

 

 

Shortcut: My favorite keyboard shortcuts in SQL Server Management Studio (SSMS)

Visual Studio is a very configurable tool, and particularly in the area of keyboard shortcuts. Because SQL Server Management Studio (SSMS) is based on Visual Studio, it inherits many of these configuration options.
SSMS has a very rich set of keyboard shortcuts. Without trying to cover most of them, I do want to highlight a few that I think are really important to know how to use.

Let's start with an easy set of commands:

You might need to change the case of some values. If I have this code:

I might decide that our new coding standards say that variables that are treated like constants, must be in all capitals and with words separated by underscores. (This is often called SHOUTY_SNAKE_CASE).

I can just highlight the name (or double-click it), the hit Ctrl-Shift-U to make it upper case.

And Ctrl-Shift-L would make it lower case.

Now, what if I want to rearrange the order of these declaration lines. If I needed to move the @A_CONSTANT_VALUE line below the other line, I often see people highlight the whole line, then cut it, then paste it below.

A great keyboard shortcut for doing that, is to hit Alt-Shift-T. No matter where you are on the line, it just moves the line down by one.

If I wanted to then add a line in between these two lines, what I typically see people do is to put the cursor on the D in the second line (at the beginning of the line), then hit Enter to move it down one, then use the up arrow to go back to the newly emptied line.

What you can do instead is put the cursor anywhere on the first line, and hit Ctrl-Shift-Enter.

I'm sure that I'm somewhat anal, and because of this, I also often spend time cleaning up lines. So if I come across a line like the second one here, the spacing grates on me:

The quickest way to clean up the second line is to highlight the whole line, then hit Ctrl-K followed by Ctrl-Backslash.

Notice that it automagically removed all the messy whitespace. It can do multiple lines at once, but it won't rearrange what's on each line.

Finally, if I needed to comment out this code, I'd highlight it and hit Ctrl-K then Ctrl-C.

And Ctrl-K followed by Ctrl-U will uncomment it.

Two more easy shortcuts: Ctrl-Home takes you to the top of the script. Ctrl-End takes you to the bottom of the script.

The final most useful keyboard shortcut is Ctrl-R. It hides or shows the results pane.

You'll find a detailed list of keyboard shortcuts here:

https://docs.microsoft.com/en-us/sql/ssms/sql-server-management-studio-keyboard-shortcuts