Architecture: Software as a Service Databases – Single DB or DB per Client

I have a varied client base. Many of my customers are traditional larger financial organizations. But I also work with start-ups and software houses. (Microsoft calls these ISVs – Independent Software Vendors).

Most software houses don't want to sell software any more. Instead, they want to repackage their solutions as services i.e. they want to offer Software as a Service. (SaaS)
Converting an application to a service is not easy. Even Microsoft had false starts with moving their on-premises applications to cloud-based offerings.
Lately, I've spent a lot of time considering Microsoft's Well Architected Framework (WAF). It provides pillars of guidance targeted at architectural excellence. There is a specific article that covers Multitenant SaaS on Azure. Part of that discussion is around the use of Azure SQL Database and Elastic Pools. This is an area that's often missed in these discussions, and it's the database aspects of that topic that I want to drill into further in this post.

Supporting SaaS with Databases

Databases are dear to my heart. It's critical to get the database aspects of an SaaS project right, and do that early on.
 
On-premises applications are mostly single-tenant. They support a single organization. We do occasionally see multi-tenant databases. They hold the same types of information for many organizations.
 
But what about SaaS based applications? By default you'll want to store data for many client organizations. Should you create a large single database that holds data for everyone? Should you create a separate database for each client? Or should you create something in-between.

As with most things in computing, there is no one simple answer to this.

Here are the main decision points that I look at:

Isolation

For me, this is the #1 item. You have to decide how important it is to isolate one client's data from other's. Developers always tell me "the app does that".
 
In multi-tenant databases, you are always only a single WHERE clause away from showing one client's data, to another client. Writing clean code with great separation is much harder in multi-tenant databases.
 
You need to start by considering what would happen if this went wrong. For some ISVs, this would be deeply embarrassing but still manageable. For other ISVs, this would be terminal for the organization.
 
Imagine the discussion between your CEO and a client after you showed their data to someone else. How would that discussion go? Would you still have a job? Would you still have an organization?
Image by Dmitry Ratushny
Image by Dmitry Ratushny

If you have even the slightest doubt about this, use a separate database per client. You still have the chance to mess that up, but you are starting in a better place.

Price

This one is pretty easy. Single larger databases still cost less than large numbers of smaller databases. If cost is your main issue, you will lean towards single-database designs.
 
Note: Azure SQL Database offers elastic database pools. Consider them before making your decision.
Cloudiness
A key aspect of "cloudiness" is resource granularity. Having separate databases for each client is much more "cloudy". Multi-database designs have many advantages. Each of your clients might need different levels of performance. Or they might need different features. This is easy in a multi-database design.
 
You might also be able to add (and on-sell) features to specific clients. For example, one client might want a read-only copy of their data in another location. Other clients don't need that.
 
Having separate databases lets you decide these things on a client-by-client basis. You can scale the databases up and down to different levels, for each client.
 
And it can even vary within a single client organization. They might require different environments (i.e.: Production, UAT, Test, etc.). Each environment can have different capabilities and features.

Noisy Neighbors

Everyone who's used a single database to support a large number of clients has run into the "noisy neighbor" situation at some time.

Image by Nik Shuliahin
Image by Nik Shuliahin
One client can't get their work done because of the overhead from another client. They don't realize they are using the same database as someone else.
 
These problems are far easier to solve with separate databases.
 
If your clients are all in a single database, moving one client to another database can be hard. You need to plan for that from day #1.

Query Performance and Complexity

With multi-tenant databases, almost every table will contain the tenant ID. This column will be part of almost every join operation.
It's easy to see that these databases perform worse.
 
The code you need to write is also more complex.

DevOps and Deployment Friendliness

A multi-database model is much more DevOps friendly. Single smaller databases can often be "spun-up" and then destroyed again, much faster than larger databases.
It is much easier to give each developer their own environment with their own databases during development and testing.
A single large database is obviously easier to manage, but you should be looking to automate the deployment and core management of your databases anyway.
Recovery/Rollback/Import/Export/Retention
Have you considered what happens when an "oops" moment occurs? One client wants to roll back their data to an earlier point in time. This can be very complex to achieve with single-database designs. It's trivial to achieve with multi-database designs.
 
What if different clients have different data retention policies?
What if clients want to be able to import/export their own data?
 
You can do all these things with both designs, but they are much simpler with multiple databases.

Data Ownership

You need to consider who will own the data that's used by your application. When you use a single-database model with multiple tenants, it's clearly going to be a database that you own and are responsible for.

Is that an appropriate risk for your organization?

One model that I'm seeing more commonplace now is that while the ISV owns and operates the application, the end customers own their own databases (and pay for them). This might simplify your legal situation in regards to data ownership.  It might also help if there are data sovereignty issues.

Take legal advice on this.

Summary

As I mentioned, this isn't a simple decision. Nowadays for cloud-based PaaS (Platform as a Service) databases to support SaaS applications though, unless there is a compelling reason not to, I'd suggest starting with a separate database for each client every time.

Cosmos Down Under podcast 4 with Guest Will Velida

I've just finished publishing another show for the Cosmos Down Under Podcast. This time the guest is Will Velida.

Will is an Engineer for Microsoft working in the Fast Track for Azure team. His focus is on application development and modernization. He is passionate about developers from all walks of life build great things using Azure. Prior to joining Microsoft, he has worked as a software engineer in the financial and agricultural sectors. He was also a Microsoft MVP in the Data Platform category, focusing on Azure Cosmos DB.

In this show, Will discusses how data is stored in Azure Cosmos DB. In particular, we discussed containers and partitioning.

I hope you enjoy it. You'll find it here: Cosmos Down Under Podcast.

T-SQL Tuesday 153: The Conference that Changed Everything for Me

This month, T-SQL Tuesday is hosted by an old friend of mine Kevin Kline. He's an old friend as I've known him a long time, not because of age. (There are lots of miles left in Kevin).  The invite asked about a conference or event that changed your life, that created an opportunity, or just changed your life. It could be career related or another area.

Background

I was invited to join the Microsoft Regional Director program back in the early 2000's. The RD program had grown out of the original DevDays events, and the RDs were the people who spoke at those events. By the time I joined, they were commonly used as speakers at events like TechEd around the world.

In fact, unlike the MVP program, the RD program had an agreement. An MVP award is for what you've done in the previous year. There are no expectations of what you'll do during the year you are awarded, apart from basically not being a horrible person. The RD program had an agreement on what you will do in the upcoming period. Speaking at tier-1 events was expected, and you were also expected to be in the top 10% of speaker ratings at those events.

Tough crowd to join!

TechEd USA events

I was used to technical training, and I spoke at many TechEd and other events around the world. However, at one of the first TechEd events in the USA, I remember signing up for the provided speaker training. I can't tell you the effect that had on me.

Microsoft had set aside a full conference room, with lighting, etc. just for the training, and they had an amazing speaker coach (Richard Klees) who worked with you one-on-one. Interestingly, his entire focus was on "big room" speaking, not on presentations in general. Speaking in big rooms is different. I remember him pointing out that the Rolling Stones don't play coffee shops, and if they did, they would act entirely differently.

Confronting Impact

I spoke to several other RDs who had attended the training with him before, and did not like him. Ironically, I loved working with him. Others said they found his training far too confronting. They didn't seem to understand that he wasn't there to be your friend. He was there to change just enough things to help you make a better impact, without trying to totally change how you speak. He didn't want to turn you into a basket-case a day or so before your session or sessions.

There were so many things I learned in those sessions. A big issue for me is that I was too softly spoken for a big room. By the time he was happy, I felt like I was pretty much shouting, and he still rated the energy level at about 5 out of 10.

By the time it was finished, I was fairly shattered and completely exhausted. Did it improve my sessions? Absolutely no doubt about it. There are so many things he said that are stuck in my mind still.

TechEd Australia Keynote

I loved speaking at TechEd when it ran. I remember presenting in various locations in the USA, Barcelona, Hong Kong, Australia, New Zealand, and more. Working with the developer evangelists from around the world was a highlight. And of course our local ones like Chuck Sterling, Andrew Coates, Dave Glover, etc. were highlights.

The 2004 event in Australia was notable for me, because as well as my sessions, I presented part of the keynote with Eric Rudder, a Microsoft senior vice-president of server and tools. I ended up spending quite a bit of time at various events with Eric during that week.

It was fascinating to see the contrast between the social Eric with user events, and the business Eric in the rehearsals for the keynote, where he was razor-sharp focussed on the delivery were all going to make. People attending those keynotes really had no idea how much work and rehearsing went into delivering them. The section that Terry Clancy and I delivered was pretty smooth. You'd hope so, given the number of times we rehearsed it over many nights. Let's just say that I'm glad I wasn't part of the SharePoint section.

When the event arrived, I remember watching Eric psyching himself up to present, and the level of energy he put into each of us backstage. I was pleased with our section and Eric seemed super happy backstage afterwards.

 

 

 

Book Review: SQL Server Query Tuning and Optimization

I was pleased to be sent a pre-release copy of Benjamin Nevarez's new book SQL Server Query Tuning and Optimization. Last time, I reviewed his High Performance SQL Server book.

This book seems to be somewhat new and somewhat an update, but this time with the main focus on query tuning and optimization. That's pleasing as the main way to get better performance out of SQL Server is to fix the queries. much more so than anything to do with the hardware or server configurations that so many people focus on.

I've known Benjamin for a long time. He's a very skilled SQL Server professional. And again the technical reviewer for the book is another very skilled old friend in Mark Broadbent. Brandon Leach, Ajmer Dhariwal, and Artur Wysozanski were also technical reviewers. Once again, I had high expectations.

And once again, I wasn't disappointed.

The book covers a rich set of topics, including most things you need to know when performing query optimization for SQL Server. It provides a detailed background to each area, and then recommendations on how to proceed.

Once again, it was so pleasing to read a technical book that was well written, used English well, and  wasn't full of technical errors. Nowadays, there aren't many technical books that fall into this category. Several times lately, I've been sent technical books for review, and I've ended up telling the publisher that they don't want me to write a review as those books were so poorly written. It's important to support people who are still writing quality technical books.

Now, were there again areas that I disagreed with Benjamin on?

Yes.

I just don't share the enthusiasm for the in-memory OLTP options in SQL Server, even for temporary objects. I wish it wasn't so, but I've spent so much time trying them over the years, and let's just say, I don't use them. The SQL Server team spent so much money on their in-memory OLTP and in its current form, even in SQL Server 2022, it just doesn't deliver. I think it's one of Bob Ward (product team)'s dreams to ever have me happy with it, but that's not happened yet. It shines in a few very niche areas, but for most customers, I tell them to avoid it.

Another area that I'd also differ on now is the Data Collector. I really don't think this has much future. Benjamin covers it still in this version of the book, but I'd strongly suggest that customers give it a miss.

Were there again areas where I'd love to see the book improved?

Yes.

Here are some suggestions:

  • I'd like to see statement terminators in all the code (personal bugbear).
  • The order of the topics in theory makes sense, but again in practice, I'm not so sure. I'm concerned that most developers who pick up this book won't get through the theory, to then find the gold. Perhaps there's a need for another book that just lists all the common issues for query tuning, and then for each one, explains what's going on. Alternately, it might have made for an interesting appendix, to have a list of common issues, and for each to tell you which parts of the book are most relevant. So many developers would be just looking for quick answers, and aren't going to read the whole book to find that.
  • I'd love to see more info on working with captured traces. There is so much value in this analysis, of both the core queries, and the normalized versions of the queries. Nowadays, it's the number one thing I start looking at when doing performance tuning work on systems that I haven't worked on before.
  • I'd like to see the book call out even more on the importance of appropriate application design. In my work, I find that at least 70% of the SQL Server performance-related issues that I run into, are application design issues.

The Verdict ?

Overall though? Again, a wonderful book.   9 out of 10.

It should be released very soon: https://packt.link/AbvGr

Want to learn more?

If you'd like to learn much more about tracing queries and improving your T-SQL, check out our online courses at https://training.sqldownunder.com

 

SDU Tools v22 is now available (finally)

One of our popular free resources is the SDU Tools library. If you haven't checked it out, I'd encourage you to do so. It's a large library of functions, procedures, and views all written in native T-SQL code.

You can easily use it as a complete library, or use it as examples of how to write T-SQL code. v22 is now available for download.

If you aren't on our notification list, you can add yourself here:

https://sdutools.sqldownunder.com

I'm sorry it's taken a while longer to get this version out than I would have liked, but we've finally caught up after a really busy period.

In v22, we've added the following:

  • Calculate Age in Months – think it's easy to work out ages? Just use DATEDIFF? If you think so, you'd be mistaken. DATEDIFF could even tell you that someone who is one day old is a month old. That's not true where I live. This function fixes that.
  • Last SQL Server Restart (work out when SQL Server was last restarted)
  • Languages (this is a great list of all the world's languages, categorized into their language families, includes the name of the language in English, and in the native language, and has the two and three character ISO codes for them.
  • List Constraints with System Names (you know you should name constraints explicitly and not let the system do it. This tool helps you find the ones that have slipped through)
  • File Path To File Extension, File Path To File Name, File Path To Folder Path – these functions take a full file path and extract out the extension, the file name, and the folder. They work with local file paths, UNC paths, and with Windows and Unix path delimiters.
  • List Untrusted Check Constraints, List Untrusted Foreign Keys – e previously provided tools to attempt to retrust untrusted check and foreign key constraints, but we realised we did not have a tool to find them in the first place. One new tool is used to list any check constraints that are currently untrusted. The other new tool does the same for untrusted foreign keys.

We've also done the usual updates to lists of SQL Server versions and builds, and a few patches to existing tools.

We hope it's useful.

Why is Greg holding a book about a duck?

One weekend many years ago, my youngest daughter Erin was looking for something to do. She was a very creative child so I suggested "why don't you write a book?"

She said she could write one, if she only had a title. I told her that you could write a book about almost any title. I randomly picked:

What the duck didn't see

(with the emphasis on didn't)

To get her started, I wrote some content, then asked her to continue. She did the same, and then I wrote some more. I turned out to be quite fascinating. I had no idea where she was taking the story and I'd be excited to read what she'd written. Along the way, my eldest daughter Kirsty wrote some content as well. My second daughter Andrea's name was used for the main person in the story.

When life intervened, we hadn't quite finished it, and quite a while elapsed. So last year, I thought it was time to complete it. And this (relatively short) book is the result. I also thought it would be great for the two of them to be "published authors".

It's the story of Andrea Blowhard who is a new detective in Lyttleburg, looking to make a great impression on her new boss. And a fascinating case fell right into her lap. You can find it here:

I hope you enjoy it. But more importantly, I wanted to share this as a concept that you might consider with your own children.

 

Azure SQL Database now has an improved STRING_SPLIT !

I get pretty excited about new T-SQL enhancements. Back in 2016, I was so pleased to see Microsoft finally add a string split option to T-SQL, but my enthusiasm was limited when I saw how it was implemented. Now that's mostly fixed !

While it's possible to build functions that did string splitting just like I wanted, the problem is that no matter how you implemented them, they were really just too slow. And most importantly, much slower than a built-in function would be.

The Issues

SQL Server 2016 allowed us to write code like this:

That code would return this output:

STRING_SPLIT is a table-valued function. It took two parameters:

  • The string to be split (Unicode or not)
  • The separator

It returned a set of rows with a single column called value, and it was fast.

So what were the problems?

  • The separator could only be a single character
  • Often you need to trim the values being returned. (That can obviously be done outside this function using TRIM but it'd be nice if there was an option to do it in the function).
  • But the biggest issue of all was that because a table-valued function doesn't have an inbuilt order for returning rows, you didn't know which token was which, that had been extracted from the string.

Now a TVF returns a table, and tables don't have built-in orders either. Nor do views. But what was needed was another column that told you which rows was which, and you could then use that for ordering when needed.

Azure SQL Database

STRING_SPLIT in Azure SQL Database now has an optional parameter to enable an ordinal (i.e. a position).

That extra parameter is a bit or an int that says if an ordinal value should be added to the output.

So this code:

Now returns:

I love working with Azure SQL Database for many reasons, but one of those is that you get new language extensions before other parts of SQL Server. I've been using this feature since the day that I first heard it existed, and it's working great. Now the updated documentation is online as well.

It also means that I can obtain any specific token that I want. For example, get me the third token in the string:

And that returns:

This is a much-needed enhancement.

ADF: Use MSIs not SQL Logins whenever possible

Azure Data Factory (ADF) is great for moving data around. It often needs to connect to SQL databases. If you're creating linked services for this, please try to avoid using SQL logins. Don't use usernames and passwords for authentication if you can avoid them.
 
Azure SQL Database lets you can create a user from a data factory's managed service identity (MSI). MSIs are a special type of Service Principal (SP). They provide an identity where you don't need to manage the ID or any sort of password.  Azure does that for you.
 
Each ADF exposes an Azure-based identity on its property page. Here is the main menu for my BeanPerfection data factory:
Data factory properties page
You might have guessed you should go to the Managed identities menu option, but that is for User Managed identities. I'll write more about them another day. The identity we're interested in is a System Managed Identity.
On this properties page, it's actually called the Managed Identity Application ID. Like all other types of Service Principal, Managed Service Identities have an ApplicationID.

Creating a user in Azure SQL Database

I create a user in an Azure SQL Database with the FROM EXTERNAL PROVIDER option. This says that Azure SQL isn't performing the authentication.

The user name is the full name of the ADF. In Azure SQL Database, we can just use the name of the ADF instead of the identity. It will do that for us.

I always make this data factory name lower-case. Once the user is created, I add it to whatever role in the database makes sense. In my example above, I used the role name datafactory_access but there's nothing special about that name.

You need to decide which role to add it to based upon the tasks that the data factory needs to perform. While it's temptingly easy to just add it to db_owner, try to resist that. If in doubt, create a role that's used for data factory access and grant it permissions as they are needed.

Must execute using an AAD-Based Connection

If you just try to execute the statements above, you might find that you get an error message saying:

Msg 33159, Level 16, State 1, Line 8
Principal 'beanperfectiondatafactory' could not be created. Only connections established with Active Directory accounts can create other Active Directory users.

You cannot create any type of Azure AD-based user in an Azure SQL Database, if your connection was authenticated as a SQL login. You must use a connection that was itself made using Azure-AD authentication.

I find that the easiest way to do that, is to make sure I have an Azure Activity Directory Admin assigned for my Azure SQL Server, and then just execute the code right in the Azure Portal. I use the Query Editor tab in the main menu for the Azure SQL Database and connect as that administrator.

 

SQL: Fix: The parameters supplied for the procedure "sp_set_firewall_rule" are not valid.

I often use the procedure sp_set_firewall_rule to set firewall rules for Azure SQL Server. (There's a similar call to set the firewall for databases). The other day though, I got an error that had me puzzled:

I also tried it with named parameters and got the same error.

When I looked at my previous scripts, I realised that I had used a Unicode string for the first parameter previously.

Solution

I changed ‘TestRule’ to N’TestRule’ and it worked fine.

I’ve not seen a procedure before that wants a Unicode string that won’t accept an ASCII string. For example, this works just fine:

Apparently strict data type checking has always been a feature of extended stored procedures and this one checks specifically for a Unicode string. I really don't use extended stored procedures much anyway.

What threw me as well is that I couldn't find it in the list of system stored procedures. It's not there because it's actually an extended stored procedure. These used to mostly have xp prefixes, not sp prefixes, and are a good example of why I don't love using prefixes like that.

I really wish though, that this procedure had a better error message. While the current one is strictly correct, it is not actually all that helpful.

 

New Online Course Released: Advanced T-SQL for Developers and DBAs

I'm really pleased to let you know that our latest online on-demand course is now released:
 
 
To celebrate the release, you can get 25% off the pricing until Aug 14th by using coupon code ATSRELEASE
 
We've had so many requests from customers to bring this course to our online platform. It was always one of our most popular in-person courses, and it's now released and fully updated.
 
The course includes the instruction plus quizzes and the same hands-on labs that we use in the in-person courses.
We've made a big effort with this course to make it really easy for you to do the labs. The labs only require you to have a fairly recent version of  SQL Server Management Studio (SSMS) installed to complete them. You don't need to install anything else. We've provided the required databases online ready for you to connect to.
 
Not on the latest version of SQL Server? Not a problem either. Unlike most other courses, our courses always cover at least all the supported versions of SQL Server and show you what's changed between versions.
 
I hope you enjoy it.