One key decision that you need to take when you decide to include databases in your DevOps processes, is how you will store the data model. What flows from that is how you will do deployment.
This is still the most common model today and is a very traditional way to work with databases during development.
At some point, you create an initial database, and after that, you keep every script that’s needed to bring the schema up to the current point. Those scripts will often include data fixes or new values for reference tables, as well as the DDL required for the schema changes.
Over time, when you are using this model, deployments take longer and longer, as more and more scripts need to be applied when upgrading a database. To get around this problem, I often see teams create new seed databases on a regular basis, so that they aren’t ever starting with the very first database.
Most scripts are just created in tools like SQL Server Management Studio, but it’s worth noting that Visual Studio 2017 includes Red-Gate’s Ready Roll tooling that can help this fit better into a source controlled DevOps based environment. (There is a free version included with VS2017 and a more capable Pro version available as an upgrade. The free version is very good though).
One key problem with this method is that large amounts of time and effort can be wasted when upgrading large databases. Imagine if a developer has had several attempts at designing an index over time, as the requirements for it have changed.
It is not sensible to apply a large index to a database, to then delete it again, to then reapply it slightly differently, to then delete it again, and to then reapply yet another version of it. Yet this is exactly what I see happen on large sites all the time. Columns are added, then removed, then different ones added, and so on, all during a single deployment, as the database moves from update script to update script.
A second problem with this model is that there is no data model that shows a developer (or others) what the database should really look like, apart from a freshly updated database.
A final problem with this model is it assumes that all databases are in one of a known series of states that can be directly upgraded with the same scripts. The problem with this is that schema drift can have occurred. For example, someone needed to patch something on the production server or servers one night, and the changes didn’t exactly make it back into the other environments, including the development environment. Or if they made it back, they weren’t implemented the same way in those other environments.
The result is that the upgrade scripts break.
The alternative to storing a series of scripts is to store the data model instead, and to then use comparison tools to work out what needs to be deployed. There is a lot of benefit to this type of deployment in that the only changes deployed are those that are needed to move from the current state to the required state.
This can be much, much faster.
This is what SQL Server Data Tools tries to do with its database projects. (This was originally Visual Studio Team Edition for Database Professionals – you have to love the marketing people who came up with that name).
Visual Studio has a T-SQL development environment in it now, along with schema and data comparison tools. I have to admit to preferring to use SQL Compare from Red-Gate. It ships in SQL Toolbelt and is simply one of the tools that I’d struggle to work without. It can compare databases with other databases, or even with a set of SQL scripts that make up the database.
We often use their DLM tools (that I’ll discuss another day) to extract the scripts for a data model from TFS or Git, and use those as a comparison source when we’re doing automated deployment.
Whichever tools you use, the challenge is that you need to maintain the data state while making these changes. And that can be a challenge. We’ll discuss that more in later posts.