In my last DevOps-related post, I discussed some of the issues that arise when using migration-based deployments.
Migration-based deployments are based on a series of T-SQL scripts that apply changes to your database from its current state to a desired final state.
One of the questions that arises though, is whether or not these T-SQL scripts should be idempotent ie:
Should the script be written so that you get the same outcome if you run the scripts multiple times?
Creating T-SQL scripts that you can reliably run more than once isn't easy. It's not required for migration-based deployments if all goes well, it might make them more reliable, and/or make them more useful when errors occur during deployment. However, achieving that isn't easy.
In essence, creating scripts like this means that instead of writing code like this:
you write code like this:
But what would happen if the table is there but different?
Sometimes it's also messier than you'd hope. For example, CREATE SCHEMA has to be the first statement in a batch. You can't write:
Instead, you need to write something like:
And apart from now losing help from Intellisense, etc. within the script items, this sort of thing just gets harder and harder.
My experience is that apart from situations that are fairly trivial, creating seriously idempotent scripts is much, much harder than it looks, and it's far too easy to create scripts that can still end up with different end states.
I think you are far better off validating the schema before running the migration, and if failure occurs, recover and re-run whenever possible.
That's fine if your script is rigidly about upgrading from X->Y. But what if your script needs to do 'skip level' upgrades, such that, rather than running a dozen scripts to go from M->Y, you run one script that just does it all. Such scripts DO need to be idempotent, because the same script will run to get someone from anywhere to anywhere. We do this, and it really isn't as hard as it would seem at least not for our needs. It's a rule that all upgrade scripts must be coded and tested to do the right thing when run multiple times. This not only allows us to do skip level upgrades, but also restartable upgrades when something inevitably goes wrong (such as in a data transformation where somehow the customer got bad/corrupt data and things go boom).
It would be nice to have simple, short scripts that reliably take you from M->N, N->O, O->P, etc. But for large and very complex databases, that can be seriously difficult to deal with (becoming unwieldy), and can make upgrades take a LOT longer in the field, when you have no control over how often a customer upgrades… especially when a customer only upgrades every two years, but you're producing new versions every sprint.
That's the point. Ideally all the scripts would be fine but writing them is just seriously difficult.
For example, if you are adding an index, do you check that the index is already there? Sure. But what about:
* Key columns
* Included columns
* Fillfactor
* PadIndex
* Compression
* Partition schema
* Filegroup
and so on. Are you really checking that all of those are correct when just adding an index and noting that it's already there? What if there's a different index that someone added that has the same name?
If not checking all this stuff, you could still end up with different outcomes on different servers. In a perfect world, every deployment would go through the same process, and no bugs would ever be patched on a production server. But I see this all the time, particularly as DBs get larger.
Basically, we generate most of our scripts, using a 'baseline' database and the current database. So if an index is different in definition (in any way) it is scripted out. The script basically says "if it exists, drop it. Then create it correctly." We don't auto-generate deletes ("too dangerous") so those are hand scripted, and much easier to get things right.
As part of our CI process, we run the upgrade on various older versions of the database, and then run a complete schema-compare between the upgraded database and the current ('model') database. ANY differences fail the build.
I guess our 'secret sauce' is the automated upgrade script generation that covers 80% or so of the upgrade needs. Data transformations are done by hand, along with schema-delete operations. The end result is an upgrade script that will take any version of the database (from the 'baseline') to the current, up to the current.
As for customers adding their own indexes and schema, we have strict naming conventions that we enforce, so that we only deal with our own schema, and ignore custom schema. The upgrade scripts are smart enough to recognize 'unknown' FKs into a table we're going to drop or modify, for example, and drop them at run-time with a log message for the client to re-apply after upgrade as necessary.
I've done similar in the past, but that's still fairly manual, as processes go. And imagine how you'd get on if it was deployed to various servers and fixes had been applied inconsistently on those target systems in emergencies. A compare to your 'model' database is fine, but the process assumes the targets do match that model.
If, for example, a desperate indexing fix was applied slightly differently to target systems, how would you cope? When it was then added to your model database, let's say named the same but implemented differently in some way, how would the upgrade script deal with that?
Our system has been in place for a decade, and actually works pretty well. The naming issue isn't an issue because of our naming conventions. If an existing index is tweaked for a specific customer and their non-standard data distribution (or quantity), the script is stored and reapplied after upgrades in a manual fashion. Our database is big and complex, and our customers are enterprises, so the resources to create and maintain these 'customizations' exist, and we document how to do so without messing up upgrade. As I said, the upgrade system has been refined over a decade, and is pretty solid. And idempotent SQL scripts are a core part of it. Obviously this wouldn't work for everyone in every situation. I guess one of the parts that help it work is that the unit of 'idempotent SQL script' is kept relatively small, everything broken down into discrete components. A CI build utility that runs full database compares helps keep everything on track, and the fact that we test upgrades against multiple versions with every build helps as well.