I have a varied client base. Many of my customers are traditional larger financial organizations. But I also work with start-ups and software houses. (Microsoft calls these ISVs – Independent Software Vendors).
Most software houses don't want to sell software any more. Instead, they want to repackage their solutions as services i.e. they want to offer Software as a Service. (SaaS)
Converting an application to a service is not easy. Even Microsoft had false starts with moving their on-premises applications to cloud-based offerings.
Lately, I've spent a lot of time considering Microsoft's Well Architected Framework (WAF). It provides pillars of guidance targeted at architectural excellence. There is a specific article that covers Multitenant SaaS on Azure. Part of that discussion is around the use of Azure SQL Database and Elastic Pools. This is an area that's often missed in these discussions, and it's the database aspects of that topic that I want to drill into further in this post.
Supporting SaaS with Databases
Databases are dear to my heart. It's critical to get the database aspects of an SaaS project right, and do that early on.
On-premises applications are mostly single-tenant. They support a single organization. We do occasionally see multi-tenant databases. They hold the same types of information for many organizations.
But what about SaaS based applications? By default you'll want to store data for many client organizations. Should you create a large single database that holds data for everyone? Should you create a separate database for each client? Or should you create something in-between.
As with most things in computing, there is no one simple answer to this.
Here are the main decision points that I look at:
Isolation
For me, this is the #1 item. You have to decide how important it is to isolate one client's data from other's. Developers always tell me "the app does that".
In multi-tenant databases, you are always only a single WHERE clause away from showing one client's data, to another client. Writing clean code with great separation is much harder in multi-tenant databases.
You need to start by considering what would happen if this went wrong. For some ISVs, this would be deeply embarrassing but still manageable. For other ISVs, this would be terminal for the organization.
Imagine the discussion between your CEO and a client after you showed their data to someone else. How would that discussion go? Would you still have a job? Would you still have an organization?
Image by Dmitry Ratushny
If you have even the slightest doubt about this, use a separate database per client. You still have the chance to mess that up, but you are starting in a better place.
Price
This one is pretty easy. Single larger databases still cost less than large numbers of smaller databases. If cost is your main issue, you will lean towards single-database designs.
Note: Azure SQL Database offers elastic database pools. Consider them before making your decision.
Cloudiness
A key aspect of "cloudiness" is resource granularity. Having separate databases for each client is much more "cloudy". Multi-database designs have many advantages. Each of your clients might need different levels of performance. Or they might need different features. This is easy in a multi-database design.
You might also be able to add (and on-sell) features to specific clients. For example, one client might want a read-only copy of their data in another location. Other clients don't need that.
Having separate databases lets you decide these things on a client-by-client basis. You can scale the databases up and down to different levels, for each client.
And it can even vary within a single client organization. They might require different environments (i.e.: Production, UAT, Test, etc.). Each environment can have different capabilities and features.
Noisy Neighbors
Everyone who's used a single database to support a large number of clients has run into the "noisy neighbor" situation at some time.
Image by Nik Shuliahin
One client can't get their work done because of the overhead from another client. They don't realize they are using the same database as someone else.
These problems are far easier to solve with separate databases.
If your clients are all in a single database, moving one client to another database can be hard. You need to plan for that from day #1.
Query Performance and Complexity
With multi-tenant databases, almost every table will contain the tenant ID. This column will be part of almost every join operation.
It's easy to see that these databases perform worse.
The code you need to write is also more complex.
DevOps and Deployment Friendliness
A multi-database model is much more DevOps friendly. Single smaller databases can often be "spun-up" and then destroyed again, much faster than larger databases.
It is much easier to give each developer their own environment with their own databases during development and testing.
A single large database is obviously easier to manage, but you should be looking to automate the deployment and core management of your databases anyway.
Recovery/Rollback/Import/Export/Retention
Have you considered what happens when an "oops" moment occurs? One client wants to roll back their data to an earlier point in time. This can be very complex to achieve with single-database designs. It's trivial to achieve with multi-database designs.
What if different clients have different data retention policies?
What if clients want to be able to import/export their own data?
You can do all these things with both designs, but they are much simpler with multiple databases.
Data Ownership
You need to consider who will own the data that's used by your application. When you use a single-database model with multiple tenants, it's clearly going to be a database that you own and are responsible for.
Is that an appropriate risk for your organization?
One model that I'm seeing more commonplace now is that while the ISV owns and operates the application, the end customers own their own databases (and pay for them). This might simplify your legal situation in regards to data ownership. It might also help if there are data sovereignty issues.
Take legal advice on this.
Summary
As I mentioned, this isn't a simple decision. Nowadays for cloud-based PaaS (Platform as a Service) databases to support SaaS applications though, unless there is a compelling reason not to, I'd suggest starting with a separate database for each client every time.
I've been working at a site that uses proxy servers for Internet access. And we were unable to clone a Git repository in Azure DevOps (AzDO) from within Visual Studio (VS). It was quite frustrating.
Visual Studio has got proxy settings and I had configured those:
They live in the system.net node within the settings.
VS was able to get out to the Internet, and I could find and try to connect to projects in AzDO but I couldn't clone them from within there.
You might see timeouts, or if the connectivity changed after you already previously had it working, you might just see just "Pull operation failed":
Solution
When VS executes Git calls, it makes those calls to an underlying Git layer. Git has its own proxy, so even though you can appear to connect and find a project from within VS, you can't perform Git operations (like cloning) using that proxy setting.
You still need to configure the Git proxy as well.
Open a Git CMD window (i.e. not just a normal command window), and execute:
I've had a number of people asking me lately about what's happening with SQL Server certifications. There are a number of clients that I deal with, where they require their staff to regularly update their certifications. It's been part of their ongoing commitment to training.
In particular, I'm asked When will the new SQL Server certifications be available?
And the (perhaps surprising for some) answer is: They won't be.
The way that Microsoft certifies people has gone through a seismic shift. The entire focus is now on certifying people's ability to perform roles, rather than their ability to use a particular product.
And of course in the past, it wasn't just a particular product, it was a particular version of a particular product.
Role Based Levels
The new role-based certifications have three basic levels:
Fundamentals – these certifications demonstrate a basic knowledge of an area and do not expire.
Associate – these certifications are more in depth and are targeted at typical professionals.
Expert – as the name says, these are targeted a highly-competent individuals.
There are other questions that I've been hearing:
Q: All the data certifications say "Azure" in the name. Will there be any "on-premises" versions?
A: It's a great question but the answer seems to be "no". The Azure-named exams are intended to cover that knowledge as well. Mind you, most clients that I deal with now, have part of their systems in Azure anyway. Most have some type of hybrid environment now happening.
An obvious disappointment is the sun-setting of the existing MCSA, MCSD, MCSE style certifications, particularly for people who were part way through acquiring them.
Q: For data, where are the expert certifications ?
A: They don't exist yet. I do hope they will as it gives people more to strive for.
You can find details of the new role-based certifications here.
How tough are Expert exams?
It's important to understand that the expert-level exams really are targeted at expert-level candidates. The AZ-400 exam is the DevOps Expert level. What seems to surprise many candidates is that the exam is broader than they imagined. They were expecting an exam that just covered what was included in Azure DevOps as delivered by Microsoft.
To pass, you really need to know your way around other common tools. For example how to integrate Maven, Gradle, SonarCloud, Whitesource Bolt, etc., how to implement OWASP testing, and understand not just how to integrate GitHub but concepts like GitFlow. The contention is that an expert won't just know the Azure DevOps parts; they'll be experienced with integrating other commonly used tools.
Three years back, with much fanfare at a partner conference, Microsoft announced the Microsoft Professional Degree program. It was going to be a set of courses that you could take that would lead to one of their professional degrees.
Now here in Australia, you can't just call something a degree, and I'm guessing that's the same in the USA, so I wasn't surprised when I noticed soon after I started with it, that the name had changed to the Microsoft Professional Program (MPP), and they'd dropped the "degree" word.
The first program available was for Data Science. It required you to complete 11 courses. Each course restarted every three months, and had an exam and certificate at the end. Importantly, to complete a program, you also had to complete a practical course, called the capstone project.
I loved these programs, and I completed four of them: Data Science, AI, Big Data, and DevOps.
It's not all roses
The program retirement was announced the other day. You can't enrol after next month, and you must complete everything before December.
Many people are part way through the program, have paid to certify previous exams, and are now unable to complete before the deadline. That's really not fair to them. A nice touch would have been to at least refund their exam certification costs if they're part way through a program.
And more importantly, what does it really mean for those that have invested time, money, and effort in the programs? I don't know but I'd almost bet that within a year, it'll be hard to even find any trace of the entire program ever existing.
What I don't love is the way that Microsoft has constant churn in these types of things. For things like certification that require substantial commitments to pursue, this type of churn is just not appropriate.
I wish it was the first time that many of us have been down this same path but sadly, it's not. (#MCM4Life)
Microsoft's offerings around learning have been really messy and jumbled for a very long time. The move to refocus learning around Microsoft Learn is a good move. I just wish they'd learn how to make these types of changes and consolidations without leaving their biggest supporters feeling abandoned (again).
Why I liked the MPP
I really liked the format of the MPP for a number of reasons:
You could take any of the courses for free (this meets the goal of the product groups who just want to get the information out there widely and without the friction of cost). Initially, that also included the exams.
You could pay for a certified exam. The courses were done in conjunction with edX and they would check out who you were. (i.e. government issued photo ID, etc.) If you wanted the certification, you needed to pay to certify all the relevant exams.
The content was not just Microsoft content. For example, the initial statistics course was actually a course from Columbia University. Some of the content was taught by DataCamp (who I'm not happy with after their data breach), and by a prof from Norway. This gave the material a wider context.
There was often a choice in the content. For Data Science, you could use either R or Python in each required course. For AI, there was a choice of areas of AI to work in: Speech, Vision, etc.
The work could be done in your own time, and fitted in amongst other activities as you had free time.
Tracks were expanding
Eventually, there were many tracks:
Data Science
AI
Big Data
DevOps
IoT
Data Analysis
Cybersecurity
Entry Level Software Development
IT Support
Thanks is due
Naturally, like with most things, the quality varied across the courses. But overall, I liked the effort that had been put into the classes.
A hearty thank you to anyone who was involved in creating these courses and their associated materials!
Awesome image by Pete Pedroza
For Posterity
Like a friend of mine and fellow MVP, Thomas LaRock, said in a recent post, I have no idea what really happens to the certifications that were achieved in the program. As I mentioned, I suspect they have suddenly been massively devalued. And as Thomas did, I include my course certificates for posterity.
DevOps has quickly become a core part of how many organizations deliver IT, and in particular, how they deliver applications. But just as quickly as it has become popular, a whole series of XXXOps names have appeared. One of the latest is AIOps. So is it just yet another almost meaningless acronym?
When I first saw the term, I was presuming this would be about how to deploy AI based systems, and I wondered why on earth that would need a special name. But that's not what it is.
So what is AIOps?
AIOps is the use of artificial intelligence (AI) and machine learning (ML) techniques to allow us to analyze IT problems that occur so that we can respond to them fast enough to be useful.
The core problem is that we're now generating an enormous volume of metric and log data about every step of all our processes, and about the health of all our systems and applications yet so much of that data is never processed, or at least not processed fast enough to be useful.
Only machines can do this.
The term AIOps seems to have been coined by Will Cappelli (a former analyst with Gartner). In the end, humans won't be scouring logs and responding to what they find. Instead, they'll be teaching the machines what to look for, and how to correlate information from a variety of sources to find what is really going on.
Cappelli is now at Moogsoft and sums up AIOps quite distinctly:
AIOps is the application of artificial intelligence for IT operations. It is the future of ITOps, combining algorithmic and human intelligence to provide full visibility into the state and performance of the IT systems that businesses rely on.
People are already doing this but it's likely in the future that this will become a well-known job role. It will be important to guide the machine's learning to teach it to recognize the appropriate patterns.
If you are working in related IT roles, it might be time to start to add some data science, AI, and/or ML into your learning plans.
I'm presenting a session on Azure DevOps for SQL Server DBAs that's designed as an intro for data people who haven't really worked with it before, at Red-Gate's SQL in the City events in Brisbane (May 31), Christchurch (June 7), and Melbourne (June 14).
Looks like a fun lineup for the day, and it'd be great to catch up with you at one of those events. You can find more info here:
Also, the day after each of those events, I'll likely also be presenting at SQL Saturday in the same cities. I'll let you know more when the speaking lineup for those events is released, but either way, I'll be at those events too and would love to catch up with you.
I spend a lot of time on client sites and time and again, one of the mistakes that I see people making, is trying to start with large projects. I think one of my all time favorite quotes about IT is:
Any successful large IT system used to be a successful small IT system.
The next time you're thinking about creating a project that's going to have a big bang outcome, please remember this. The history of the industry is that it really is likely to be a big bang, and not in terms of being a big success like you'd hoped for.
A staggeringly high percentage of large IT projects fail.
This is even more important in business intelligence projects. The industry is littered with companies that have spent a fortune on BI projects, and almost no-one in those companies can, or do, use the outcomes. It's a sad but true statistic.
Asking users what they want, gathering a large amount of detail, then building it, testing it, and delivering it to them sounds good, but it almost never works. Unfortunately, it's how many larger organizations think projects need to be managed. They assume that creating an IT project is like building a bridge.
It's not.
The first problem is that the users don't know what they want. Don't blame them later because you think they didn't give you the right instructions. That's your fault for assuming they can describe exactly what they want. In many cases, until they see some of the outcomes, they won't start to understand what the system will do, and until then, they won't realize what they really need.
Second, no matter how clearly you document their requirements, it won't be what they need. One of my Kiwi buddies Dave Dustin was having a bad day once, and I remember he mentioned that he was going to just spend the day delivering exactly what people asked for. That was wonderful and beautifully insightful, because we all know that it would lead to a disaster. It's little surprise. They might have said "sales" but they really meant "profit", and so on.
Finally, the larger the project, the longer it will take to deliver, and by then, even if you'd done it perfectly, the users' needs will have changed, so it still won't be what they want.
When you're starting a BI project, I'd implore you to find a project that has these characteristics:
Small enough to be done in a couple of weeks at most
Based on mocked up data in front of the target users
Large enough to make a significant difference to someone's life or work
Targeted at someone who's important enough to be a champion for the rest of what you want to achieve
(Ideally) targeted at someone "who pays the bills"
Start by doing a great job, and building on that, once you have support.
I don't know if it's considered some sort of modern trend, but what is it with applications now that just swallow errors instead of dealing with them? Is there an edict within these companies that errors should get shown, so they can argue their app doesn't have errors?
I'm working with a SaaS app right now. It does editing. Sometimes when I save, it just doesn't save. No error, just nothing saved. Or every now and then, I find the order of what I've entered just gets changed. Again, no error, but the order was changed.
Worse, sometimes when I then try to correct the order, it shows it as done, but next time I go back to that screen, the order is back the way it was in the first place.
On many occasions, if I close my browser, open it again, and log in, it all works OK again for a while.
But it's not just these types of applications. I've lost count of the number of sites I've been to, where supposedly serious applications are being developed, yet the code is full of try/catch blocks but the catch blocks are empty ie: silently ignoring any errors that occur.
How did we get to the point that this is what passes for application development now? Apps that mostly work and fail silently?
System configurations are becoming more complex all the time. Each and every server, container, and key application and service today has log files that tell you a wealth about what's going on under the covers. But how accessible are those log files in your organization?
If you aren't using a log management tool, you probably should be.
Here are a few easy ones to get started with:
Azure Monitor
One thing that I do find frustrating with Microsoft tooling at present is the constant churn of product names. A while back, we had Application Insights that could collect details of what was happening within an app. The data for that was stored in a tool called Log Analytics, and it could also collect operating system logs and more. Agents were provided for on-premises systems.
Originally, these tools had different query languages but eventually, the query language for Log Analytics was one that's used. It's awesome to be able to write a query to simply find and filter log details.
For my SQL Server buddies, there were SQL Insights which has now morphed into SQL Server Intelligent Insights along with Azure SQL Database Intelligent Insights. These allow you to capture bunches of info about your SQL Server instances and databases so very simply.
I constantly visit client sites where they have purchased tools for this, and those tools aren't even close to being as useful as these Azure ones. And they don't just work with Microsoft servers and services.
Anyway, these have now all been bundled up again under the name Azure Monitor.
Azure Monitor also offers built-in integration with popular DevOps, issue management, ITSM and SIEM tools. You can use packaged solutions for monitoring specialised workloads, or build your own custom integration using Azure Monitor REST APIs and webhooks.
Papertrail
Another interesting offering from our friends at SolarWinds, is Papertrail. Their claim is "Frustration-free log management. Get started in seconds.
Instantly manage logs from 2 servers… or 2,000". Papertrail seems to be gaining a stronghold in the Linux, MySQL, Ruby, Apache, Tomcat areas along with many others.
In the end, if you aren't using one of these types of tools, you probably should be.
I used to do a lot of work at the operating system and network level. I was always fascinated watching people use network trace tools when they were trying to debug a problem. The challenge was that they had no idea what was normal activity on the network, and what wasn't.
The end result of this is that they'd then spend huge amounts of time chasing down what were really just red herrings.
When you don't know what normal activity looks like, everything looks odd.
Today, I see the same thing with traces of SQL Server activity, either using SQL Profiler (and/or SQL Trace), and Extended Events Profiler. I also see the same thing with insights data sent to Log Analytics, and the outcomes of many expensive SQL Server monitoring tools.
For example, if you are looking at a SQL Server trace, and you see a large number of sp_reset_connection commands. Is that an issue? When would it be an issue, and when is it just normal?
If I see an sp_reset_connection executed on a connection followed by a number of other commands, I know that the application is using connection pooling. If however, I see a bunch of those on the same connection, without any commands executed in between, I know that the application code is opening connections when it doesn't need to. Perhaps it should be opening the connection closer to where it decides if it needs it.
The key point is that it's really important that you learn to use these tools before you have a problem. You need to be able to recognize what's normal, and what isn't.