Thoughts from Data Platform MVP and Microsoft RD – Dr Greg Low
Architecture: Software as a Service Databases – Single DB or DB per Client
I have a varied client base. Many of my customers are traditional larger financial organizations. But I also work with start-ups and software houses. (Microsoft calls these ISVs – Independent Software Vendors).
Most software houses don't want to sell software any more. Instead, they want to repackage their solutions as services i.e. they want to offer Software as a Service. (SaaS)
Converting an application to a service is not easy. Even Microsoft had false starts with moving their on-premises applications to cloud-based offerings.
Lately, I've spent a lot of time considering Microsoft's Well Architected Framework (WAF). It provides pillars of guidance targeted at architectural excellence. There is a specific article that covers Multitenant SaaS on Azure. Part of that discussion is around the use of Azure SQL Database and Elastic Pools. This is an area that's often missed in these discussions, and it's the database aspects of that topic that I want to drill into further in this post.
Supporting SaaS with Databases
Databases are dear to my heart. It's critical to get the database aspects of an SaaS project right, and do that early on.
On-premises applications are mostly single-tenant. They support a single organization. We do occasionally see multi-tenant databases. They hold the same types of information for many organizations.
But what about SaaS based applications? By default you'll want to store data for many client organizations. Should you create a large single database that holds data for everyone? Should you create a separate database for each client? Or should you create something in-between.
As with most things in computing, there is no one simple answer to this.
Here are the main decision points that I look at:
For me, this is the #1 item. You have to decide how important it is to isolate one client's data from other's. Developers always tell me "the app does that".
In multi-tenant databases, you are always only a single WHERE clause away from showing one client's data, to another client. Writing clean code with great separation is much harder in multi-tenant databases.
You need to start by considering what would happen if this went wrong. For some ISVs, this would be deeply embarrassing but still manageable. For other ISVs, this would be terminal for the organization.
Imagine the discussion between your CEO and a client after you showed their data to someone else. How would that discussion go? Would you still have a job? Would you still have an organization?
If you have even the slightest doubt about this, use a separate database per client. You still have the chance to mess that up, but you are starting in a better place.
This one is pretty easy. Single larger databases still cost less than large numbers of smaller databases. If cost is your main issue, you will lean towards single-database designs.
Note: Azure SQL Database offers elastic database pools. Consider them before making your decision.
A key aspect of "cloudiness" is resource granularity. Having separate databases for each client is much more "cloudy". Multi-database designs have many advantages. Each of your clients might need different levels of performance. Or they might need different features. This is easy in a multi-database design.
You might also be able to add (and on-sell) features to specific clients. For example, one client might want a read-only copy of their data in another location. Other clients don't need that.
Having separate databases lets you decide these things on a client-by-client basis. You can scale the databases up and down to different levels, for each client.
And it can even vary within a single client organization. They might require different environments (i.e.: Production, UAT, Test, etc.). Each environment can have different capabilities and features.
Everyone who's used a single database to support a large number of clients has run into the "noisy neighbor" situation at some time.
One client can't get their work done because of the overhead from another client. They don't realize they are using the same database as someone else.
These problems are far easier to solve with separate databases.
If your clients are all in a single database, moving one client to another database can be hard. You need to plan for that from day #1.
Query Performance and Complexity
With multi-tenant databases, almost every table will contain the tenant ID. This column will be part of almost every join operation.
It's easy to see that these databases perform worse.
The code you need to write is also more complex.
DevOps and Deployment Friendliness
A multi-database model is much more DevOps friendly. Single smaller databases can often be "spun-up" and then destroyed again, much faster than larger databases.
It is much easier to give each developer their own environment with their own databases during development and testing.
A single large database is obviously easier to manage, but you should be looking to automate the deployment and core management of your databases anyway.
Have you considered what happens when an "oops" moment occurs? One client wants to roll back their data to an earlier point in time. This can be very complex to achieve with single-database designs. It's trivial to achieve with multi-database designs.
What if different clients have different data retention policies?
What if clients want to be able to import/export their own data?
You can do all these things with both designs, but they are much simpler with multiple databases.
You need to consider who will own the data that's used by your application. When you use a single-database model with multiple tenants, it's clearly going to be a database that you own and are responsible for.
Is that an appropriate risk for your organization?
One model that I'm seeing more commonplace now is that while the ISV owns and operates the application, the end customers own their own databases (and pay for them). This might simplify your legal situation in regards to data ownership. It might also help if there are data sovereignty issues.
Take legal advice on this.
As I mentioned, this isn't a simple decision. Nowadays for cloud-based PaaS (Platform as a Service) databases to support SaaS applications though, unless there is a compelling reason not to, I'd suggest starting with a separate database for each client every time.