DevOps: Should databases use migration-based or state-based deployment?

One key decision that you need to take when you decide to include databases in your DevOps processes, is how you will store the data model. What flows from that is how you will do deployment.

Migration-Based Deployment

This is still the most common model today and is a very traditional way to work with databases during development.

At some point, you create an initial database, and after that, you keep every script that's needed to bring the schema up to the current point. Those scripts will often include data fixes or new values for reference tables, as well as the DDL required for the schema changes.

Over time, when you are using this model, deployments take longer and longer, as more and more scripts need to be applied when upgrading a database. To get around this problem, I often see teams create new seed databases on a regular basis, so that they aren't ever starting with the very first database.

Most scripts are just created in tools like SQL Server Management Studio, but it's worth noting that Visual Studio 2017 includes Red-Gate's Ready Roll tooling that can help this fit better into a source controlled DevOps based environment. (There is a free version included with VS2017 and a more capable Pro version available as an upgrade. The free version is very good though).

One key problem with this method is that large amounts of time and effort can be wasted when upgrading large databases. Imagine if a developer has had several attempts at designing an index over time, as the requirements for it have changed.

It is not sensible to apply a large index to a database, to then delete it again, to then reapply it slightly differently, to then delete it again, and to then reapply yet another version of it. Yet this is exactly what I see happen on large sites all the time. Columns are added, then removed, then different ones added, and so on, all during a single deployment, as the database moves from update script to update script.

A second problem with this model is that there is no data model that shows a developer (or others) what the database should really look like, apart from a freshly updated database.

A final problem with this model is it assumes that all databases are in one of a known series of states that can be directly upgraded with the same scripts. The problem with this is that schema drift can have occurred. For example, someone needed to patch something on the production server or servers one night, and the changes didn't exactly make it back into the other environments, including the development environment. Or if they made it back, they weren't implemented the same way in those other environments.

The result is that the upgrade scripts break.

State-Based Migration

The alternative to storing a series of scripts is to store the data model instead, and to then use comparison tools to work out what needs to be deployed. There is a lot of benefit to this type of deployment in that the only changes deployed are those that are needed to move from the current state to the required state.

This can be much, much faster.

This is what SQL Server Data Tools tries to do with its database projects. (This was originally Visual Studio Team Edition for Database Professionals – you have to love the marketing people who came up with that name).

Visual Studio has a T-SQL development environment in it now, along with schema and data comparison tools. I have to admit to preferring to use SQL Compare from Red-Gate. It ships in SQL Toolbelt and is simply one of the tools that I'd struggle to work without. It can compare databases with other databases, or even with a set of SQL scripts that make up the database.

We often use their DLM tools (that I'll discuss another day) to extract the scripts for a data model from TFS or Git, and use those as a comparison source when we're doing automated deployment.

Whichever tools you use, the challenge is that you need to maintain the data state while making these changes. And that can be a challenge. We'll discuss that more in later posts.

 

DevOps: Why Don't Database Developers Use Source Control?

I keep doing work at sites where none of the database code is stored in version control (source control) systems. I keep wondering why that is.

At a recent site, all the source code was in individual files just sitting in a single folder. That's just not sensible.

I'm left wondering why it is that almost every team that I see working with higher-level languages just assumes that some form of source control would be used, yet it's almost the opposite when I'm working with data teams.

Having decent source control makes such a difference:

  • No more overwriting changes and losing them.
  • No more wondering what changed between versions, or who changed them.
  • And so on and so on.

There seems to have never been a culture of source control among DBAs; and database developers are somewhere in between these two worlds.

One aspect of this is tooling.

Vendors like Red-Gate do a reasonable job with their source control offerings for T-SQL but some clients want a "pure-Microsoft" solution for some reason.

In earlier versions of SQL Server Management Studio (SSMS), there was support for an SCCI (Source Code Control Interface) provider add-on. That would let you connect SQL Server script projects to source control. Sadly, that disappeared in recent versions of SSMS. I gather that there might be a way to attach the Visual Studio Team Explorer to it but I haven't pursued that and I really hope that a standard interface will return soon. I feel that SSMS should interface directly with both TFS and Git as part of a default install. Having tools like this without source code interfaces built in, helps to push an inappropriate direction.

If however, you are using SQL Server Database Tools (SSDT) to build your databases, then both TFS and Git are standard connections from Team Explorer.

I just find that I can't do my database development work directly in SSDT. I find very few people do that. Most use SSMS for development.

I'd love to hear others' thoughts on why this culture has evolved this way, and how to change it.

DevOps: Infrastructure as Code – What about code quality and management?

For many years now, it has been important to script the configuration and deployment of systems, particularly virtual machines. Infrastructure as Code is now a common requirement but as the required configuration has become more complex, scripting in language like PowerShell has become more difficult.

It’s all very well to write code to add say a network adapter, but how do you check the current state of the machine?

  • Did that adapter already exist?
  • Is something else using the IP address?
  • How do you write a script to a cater for all the situations?

This leads to ever-more complex code and this is where my concerns start. Writing code for creating infrastructure needs the same discipline that writing any other code does. This includes code quality, coding conventions, error handling, source code control and versioning. Yet, who is writing this code?

Image by Wes Hicks

What I commonly see is that someone who’s come from a network admin background or similar, and who was able to hack together a few batch jobs successfully, suddenly is creating complex script for infrastructure creation. It often reminds me of cable infrastructure in 3rd world countries that sort-of gets the job done, most of the time.

And that’s just more than a bit scary for organizations that depend upon that code.

Image by Janko Ferlič

Regularly I see PowerShell scripts that wouldn’t pass newbie developer 101 tests. They are potentially highly unreliable. I think that organizations need to become much more involved in the management of this type of code, and start to exert appropriate controls on how it’s developed.

Organizations that depend on code the way it's often currently written, are likely to find things going very wrong, just when they can least afford them to do so.

Image by Sergi Viladesau

One shining light in this area is the appearance of better and better configuration management tools.. Rather than telling the operating system how to configure something step by step (ie: imperatively), you create a template of how you’d like the machine to be configured (ie: declaratively) and tell the system to “make itself look like that”.

For PowerShell, that’s Desired State Configuration (DSC).

DSC takes a JSON-based template that describes the outcome, not the process to achieve the outcome. And a bonus is that because it’s outcome driven, it’s also idempotent ie: you can run it again and again and still get the same outcome.

Image from MSDN

If you haven’t explored DSC, I’d encourage you to do so. Either way, it’s time to start to take control over the management of your infrastructure-related code.

If you'd like to learn more about Infrastructure as Code in general, Microsoft have published this course as part of the Microsoft Professional Program for DevOps. You can take the course for free (more details here: https://www.edx.org/course/infrastructure-code-microsoft-devops200-2x-0) , or pay to take it if you want to pursue the certification.

DevOps: Load Tests Need to be Part of Your Regular Deployments

One of the principles of DevOps is that you should be able to deploy regularly, with smaller low-risk changes. This addresses a key problem that I see in many sites:

  • Each code release is large
  • The code is difficult to test
  • Code merges are painful
  • Deployments take a long time
  • Deployments often fail
  • Deployments are considered to be high-risk procedures

The end result of this situation is that deployments are avoided and then a snowball effect occurs where:

  • The code is even larger
  • Code merges are even more painful
  • The deployments take even longer
  • The deployments are even more likely to fail
  • Deployments are considered to be even higher-risk procedures

It’s critical to work out how to break this cycle of pain. Smaller changes that are deployed more regularly is often the desired outcome.

But while regular deployments are helpful, so many teams leave performance (and specifically load testing) until way too late.

One of the challenges of my work is that my first involvement with many of these sites is when things are already in a tragic state and the regrets and blaming start:

Image by Tom Pumford

The worst I’ve ever seen was a start-up team that was building a new software as a service app. They needed 1000 concurrent users to have a viable business. 5000 concurrent users would be a wonderful business. But after building it for 4 years, and with less than 2 weeks before they needed to show it to the shareholders, they couldn’t get it past 9 concurrent users.

Another team was building an appointment application for a government department. They had built the entire logic on layers and layers of untestable triggers, and some triggers performed a very large number of actions. They could never get the application out of User Acceptance Testing (UAT). What was disappointing is that at that stage (long after it should have already been delivered to the customer), all they were hoping for was to find the minimal enhancement that would just get them over the line for delivery.

That’s really sad for the customer involved. If they don’t accept the app, they have a business issue. If they do accept it, they’ve bought a lemon.

Image by Charisse Kenion

Ongoing, they’re going to hate that app.

Load testing needs to be part of the routine development cycle of any app, and not just left to the end. Even if you haven’t made large changes to the app that you’re building, no doubt you have dependencies on other code or services, and any of those dependencies could have regressed.

 

DevOps: Microsoft Professional Program for DevOps

In the second half of 2016, I enrolled in the Microsoft Professional Program for Data Science, and completed it in early 2017. I have to say that I really enjoyed it overall. It was a bit challenging at times but I don't regret doing it.

If you want to get the certification, you need to enroll in the verified option for each course. Nowadays, that's pretty much $99 USD per course. You can do it for free, and if you're tight on funds, perhaps that's what you should do. I like to support the concept, and like to support both Microsoft and edX for creating these options. They are doing amazing work, so while I hear people say to just do the courses and not contribute to them, I can't say that I agree.

edX and their partners offer an incredible range of world-class courses that you can take for free, but if you want them to continue, you should consider contributing. And that applies to the non-Microsoft ones too.

I think that programs like these are more likely to be the real future for Microsoft certification in general.

Earlier this year, Microsoft created a Professional Program for DevOps. I've had an interest in DevOps for a long time, and I got the opportunity to help create one of the courses DevOps for Databases with the inimitable Steve Jones from Redgate Software. Databases are a specifically-challenging area for DevOps.

A few months back I decided to start pursuing this professional program as well. I've got one course to go (the container one) before the final capstone project. I can finish that container course in the next three months, but unfortunately the capstone project won't be available until April.

Here's the overall program:

Over the last few weeks, I've been involved in enhancing the existing Monitoring and Testing courses, and am looking forward to seeing how people find the updated versions.

To support my continuing interest in DevOps, in the upcoming weeks, you'll see DevOps-related posts from me.