DevOps: Declarative is where we want to be

If you have ever tried to write scripts to configure operating systems, you’d realize just how hard that is to get correct.

For example, if you need to ensure that your virtual machine has two network adapters, and they need to have a specific set of IP addresses, how do you do that?

Image by Markus Spiske

The traditional approach for PowerShell (and similar tools) was to try to write a step-by-step script to configure the network adapters the way you want. But where do you start? Do you write a script to check for any existing adapters and loop through them to try to remove them? Do you try to add the ones you want, and then remove the others?

You’ll quickly realize that you get into very, very complex coding because you might not be able to be sure what your starting point is. Everything has to stay on the path that you prescribed.

And worse, what happens if you run this script more than once?

That’s the problem with imperative code. You are telling the operating system the steps required for configuration.

We don’t want to be doing this.

With a declarative approach, we tell the system how we want it to end up ie: the desired state, and let it worry about how to get configured that way. This is what we want to be doing instead of writing imperative code.

PowerShell offers DSC (desired state configuration) where you describe in a JSON file, the way you want the configuration to end up. A bonus in this approach is that it’s idempotent ie: no matter how many times you run it, you end up with the same outcome.

It’s important wherever possible to be doing declarative configuration not imperative configuration. In later posts, I’ll talk more about how.

Opinion: There’s a plague we need to stop

I’ve concluded that many software vendors (particularly large ones) don’t understand how much support users of their software provide to each other, and how critical that support is.

The SQL and data communities are a good example of this. When someone has a problem and are wondering how to solve it, they don’t call Microsoft or Google or Oracle (or whichever vendor) first. If they’re lucky, they ask a colleague for help. But most will simply make a Google search (or yes a Bing search) to try to find an answer.

No matter how obscure an error message might be, if someone else has struggled with it before, at least there’s a chance that on an online forum, someone will have spelled out what caused it for them.

Even cryptic values like app IDs in Windows that look like this:

{ADA41B3C-C6FD-4A08-8CC1-D6EFDE67BE7D}

can be matched to an error or an application that’s causing the error.

Most of this happens without the vendor even being involved.

So one of my pet hates (which Microsoft have heard loud and clear on internal mailing lists) is applications that break this pattern.

Every time I have an error that says:

and nothing else, I want to scream. Even that’s enough to get an answer sometimes. “Every time I click on XXX and drag the YYY, I get an error saying Oops. Something went wrong!” might lead to a posting that solves the issue but it’s so much tougher when there’s no other info.

A plea to developers:

At the time the error occurs, even if you don’t know exactly what happened, you must know something about what you expected and what happened. Tell us something. No matter how cryptic.

Another related trend is where there is an error message but it’s a GUID:

C3479B1C-B1F3-4404-8F3C-AFA00AA0FEAF

And we think: “Great. We have something to work with” only to find that the GUID changes every time the error occurs and is only meaningful to the support team at the vendor organization.

Please don’t do this either.

Give us something repeatable that we can use to help each other.

SQL: Code for errors and be pleased when they don’t occur

I spend a lot of time in large organizations that have spent an absolute fortune on highly-available systems, yet when those systems fail over (just as they were designed to do), most of the applications in the building break.

Why?

Because the developers have assumed that nothing ever breaks and have written their code in a far too optimistic manner. Did they do their jobs?

No

Is it possible for their next layer of code to deal with, say, a server disappearing for a few seconds? Of course it is. But it’s not going to happen by accident. It’s even more important in a cloud-based world.

There was a question about deadlocks again recently on one of our local mailing lists. Can you deal with deadlocks?

Yes

Again though, none of this is automatic. But allowing for (and perhaps even expecting) failure is one of the differences in building enterprise level code rather than toy code.

Image by Ryan Fields

Plan for failure and be pleasantly surprised when it doesn’t happen often. But don’t plan for perfection or you’ll be disappointed.

While it is possible to handle deadlocks within T-SQL code, I prefer to catch them in the next layer of code (let’s call it client code here), as there are other types of errors that should be retried at that level anyway.

Applications should have retry logic to cope with things like:

  • Deadlock (error 1205)
  • Snapshot concurrency violations (error 3960)
  • Server disconnection (can be due to network issues, fail-over of HA-based systems, etc.)
  • Various resource issues on the server

It’s important to get into the habit of assuming that a transaction that you need to apply to the DB might work, rather than assuming that it will work. Always apply it via logic like:

  • While we haven’t applied the transaction to the server, and while the retry time/count hasn’t expired, let’s try to make it happen.
  • If an error occurs, depending upon the error, we might back off for a while and try again.
  • For things like deadlocks, it’s good to have some sort of exponential back-off with a random component.
  • Some errors are pointless to retry (ie: a primary key violation probably isn’t ever going to work)

Once you build it that way, things become much more robust and resilient. The user should generally be unaware of these issues, apart from a slight processing delay.

DevOps: Should databases use migration-based or state-based deployment?

One key decision that you need to take when you decide to include databases in your DevOps processes, is how you will store the data model. What flows from that is how you will do deployment.

Migration-Based Deployment

This is still the most common model today and is a very traditional way to work with databases during development.

At some point, you create an initial database, and after that, you keep every script that’s needed to bring the schema up to the current point. Those scripts will often include data fixes or new values for reference tables, as well as the DDL required for the schema changes.

Over time, when you are using this model, deployments take longer and longer, as more and more scripts need to be applied when upgrading a database. To get around this problem, I often see teams create new seed databases on a regular basis, so that they aren’t ever starting with the very first database.

Most scripts are just created in tools like SQL Server Management Studio, but it’s worth noting that Visual Studio 2017 includes Red-Gate’s Ready Roll tooling that can help this fit better into a source controlled DevOps based environment. (There is a free version included with VS2017 and a more capable Pro version available as an upgrade. The free version is very good though).

One key problem with this method is that large amounts of time and effort can be wasted when upgrading large databases. Imagine if a developer has had several attempts at designing an index over time, as the requirements for it have changed.

It is not sensible to apply a large index to a database, to then delete it again, to then reapply it slightly differently, to then delete it again, and to then reapply yet another version of it. Yet this is exactly what I see happen on large sites all the time. Columns are added, then removed, then different ones added, and so on, all during a single deployment, as the database moves from update script to update script.

A second problem with this model is that there is no data model that shows a developer (or others) what the database should really look like, apart from a freshly updated database.

A final problem with this model is it assumes that all databases are in one of a known series of states that can be directly upgraded with the same scripts. The problem with this is that schema drift can have occurred. For example, someone needed to patch something on the production server or servers one night, and the changes didn’t exactly make it back into the other environments, including the development environment. Or if they made it back, they weren’t implemented the same way in those other environments.

The result is that the upgrade scripts break.

State-Based Migration

The alternative to storing a series of scripts is to store the data model instead, and to then use comparison tools to work out what needs to be deployed. There is a lot of benefit to this type of deployment in that the only changes deployed are those that are needed to move from the current state to the required state.

This can be much, much faster.

This is what SQL Server Data Tools tries to do with its database projects. (This was originally Visual Studio Team Edition for Database Professionals – you have to love the marketing people who came up with that name).

Visual Studio has a T-SQL development environment in it now, along with schema and data comparison tools. I have to admit to preferring to use SQL Compare from Red-Gate. It ships in SQL Toolbelt and is simply one of the tools that I’d struggle to work without. It can compare databases with other databases, or even with a set of SQL scripts that make up the database.

We often use their DLM tools (that I’ll discuss another day) to extract the scripts for a data model from TFS or Git, and use those as a comparison source when we’re doing automated deployment.

Whichever tools you use, the challenge is that you need to maintain the data state while making these changes. And that can be a challenge. We’ll discuss that more in later posts.

 

DevOps: Why Don’t Database Developers Use Source Control?

I keep doing work at sites where none of the database code is stored in version control (source control) systems. I keep wondering why that is.

At a recent site, all the source code was in individual files just sitting in a single folder. That’s just not sensible.

I’m left wondering why it is that almost every team that I see working with higher-level languages just assumes that some form of source control would be used, yet it’s almost the opposite when I’m working with data teams.

Having decent source control makes such a difference:

  • No more overwriting changes and losing them.
  • No more wondering what changed between versions, or who changed them.
  • And so on and so on.

There seems to have never been a culture of source control among DBAs; and database developers are somewhere in between these two worlds.

One aspect of this is tooling.

Vendors like Red-Gate do a reasonable job with their source control offerings for T-SQL but some clients want a “pure-Microsoft” solution for some reason.

In earlier versions of SQL Server Management Studio (SSMS), there was support for an SCCI (Source Code Control Interface) provider add-on. That would let you connect SQL Server script projects to source control. Sadly, that disappeared in recent versions of SSMS. I gather that there might be a way to attach the Visual Studio Team Explorer to it but I haven’t pursued that and I really hope that a standard interface will return soon. I feel that SSMS should interface directly with both TFS and Git as part of a default install. Having tools like this without source code interfaces built in, helps to push an inappropriate direction.

If however, you are using SQL Server Database Tools (SSDT) to build your databases, then both TFS and Git are standard connections from Team Explorer.

I just find that I can’t do my database development work directly in SSDT. I find very few people do that. Most use SSMS for development.

I’d love to hear others’ thoughts on why this culture has evolved this way, and how to change it.

DevOps: Infrastructure as Code – What about code quality and management?

For many years now, it has been important to script the configuration and deployment of systems, particularly virtual machines. Infrastructure as Code is now a common requirement but as the required configuration has become more complex, scripting in language like PowerShell has become more difficult.

It’s all very well to write code to add say a network adapter, but how do you check the current state of the machine?

  • Did that adapter already exist?
  • Is something else using the IP address?
  • How do you write a script to a cater for all the situations?

This leads to ever-more complex code and this is where my concerns start. Writing code for creating infrastructure needs the same discipline that writing any other code does. This includes code quality, coding conventions, error handling, source code control and versioning. Yet, who is writing this code?

Image by Wes Hicks

What I commonly see is that someone who’s come from a network admin background or similar, and who was able to hack together a few batch jobs successfully, suddenly is creating complex script for infrastructure creation. It often reminds me of cable infrastructure in 3rd world countries that sort-of gets the job done, most of the time.

And that’s just more than a bit scary for organizations that depend upon that code.

Image by Janko Ferlič

Regularly I see PowerShell scripts that wouldn’t pass newbie developer 101 tests. They are potentially highly unreliable. I think that organizations need to become much more involved in the management of this type of code, and start to exert appropriate controls on how it’s developed.

Organizations that depend on code the way it’s often currently written, are likely to find things going very wrong, just when they can least afford them to do so.

Image by Sergi Viladesau

One shining light in this area is the appearance of better and better configuration management tools.. Rather than telling the operating system how to configure something step by step (ie: imperatively), you create a template of how you’d like the machine to be configured (ie: declaratively) and tell the system to “make itself look like that”.

For PowerShell, that’s Desired State Configuration (DSC).

DSC takes a JSON-based template that describes the outcome, not the process to achieve the outcome. And a bonus is that because it’s outcome driven, it’s also idempotent ie: you can run it again and again and still get the same outcome.

Image from MSDN

If you haven’t explored DSC, I’d encourage you to do so. Either way, it’s time to start to take control over the management of your infrastructure-related code.

If you’d like to learn more about Infrastructure as Code in general, Microsoft have published this course as part of the Microsoft Professional Program for DevOps. You can take the course for free (more details here: https://www.edx.org/course/infrastructure-code-microsoft-devops200-2x-0) , or pay to take it if you want to pursue the certification.

DevOps: Load Tests Need to be Part of Your Regular Deployments

One of the principles of DevOps is that you should be able to deploy regularly, with smaller low-risk changes. This addresses a key problem that I see in many sites:

  • Each code release is large
  • The code is difficult to test
  • Code merges are painful
  • Deployments take a long time
  • Deployments often fail
  • Deployments are considered to be high-risk procedures

The end result of this situation is that deployments are avoided and then a snowball effect occurs where:

  • The code is even larger
  • Code merges are even more painful
  • The deployments take even longer
  • The deployments are even more likely to fail
  • Deployments are considered to be even higher-risk procedures

It’s critical to work out how to break this cycle of pain. Smaller changes that are deployed more regularly is often the desired outcome.

But while regular deployments are helpful, so many teams leave performance (and specifically load testing) until way too late.

One of the challenges of my work is that my first involvement with many of these sites is when things are already in a tragic state and the regrets and blaming start:

Image by Tom Pumford

The worst I’ve ever seen was a start-up team that was building a new software as a service app. They needed 1000 concurrent users to have a viable business. 5000 concurrent users would be a wonderful business. But after building it for 4 years, and with less than 2 weeks before they needed to show it to the shareholders, they couldn’t get it past 9 concurrent users.

Another team was building an appointment application for a government department. They had built the entire logic on layers and layers of untestable triggers, and some triggers performed a very large number of actions. They could never get the application out of User Acceptance Testing (UAT). What was disappointing is that at that stage (long after it should have already been delivered to the customer), all they were hoping for was to find the minimal enhancement that would just get them over the line for delivery.

That’s really sad for the customer involved. If they don’t accept the app, they have a business issue. If they do accept it, they’ve bought a lemon.

Image by Charisse Kenion

Ongoing, they’re going to hate that app.

Load testing needs to be part of the routine development cycle of any app, and not just left to the end. Even if you haven’t made large changes to the app that you’re building, no doubt you have dependencies on other code or services, and any of those dependencies could have regressed.

 

DevOps: Microsoft Professional Program for DevOps

In the second half of 2016, I enrolled in the Microsoft Professional Program for Data Science, and completed it in early 2017. I have to say that I really enjoyed it overall. It was a bit challenging at times but I don’t regret doing it.

If you want to get the certification, you need to enroll in the verified option for each course. Nowadays, that’s pretty much $99 USD per course. You can do it for free, and if you’re tight on funds, perhaps that’s what you should do. I like to support the concept, and like to support both Microsoft and edX for creating these options. They are doing amazing work, so while I hear people say to just do the courses and not contribute to them, I can’t say that I agree.

edX and their partners offer an incredible range of world-class courses that you can take for free, but if you want them to continue, you should consider contributing. And that applies to the non-Microsoft ones too.

I think that programs like these are more likely to be the real future for Microsoft certification in general.

Earlier this year, Microsoft created a Professional Program for DevOps. I’ve had an interest in DevOps for a long time, and I got the opportunity to help create one of the courses DevOps for Databases with the inimitable Steve Jones from Redgate Software. Databases are a specifically-challenging area for DevOps.

A few months back I decided to start pursuing this professional program as well. I’ve got one course to go (the container one) before the final capstone project. I can finish that container course in the next three months, but unfortunately the capstone project won’t be available until April.

Here’s the overall program:

Over the last few weeks, I’ve been involved in enhancing the existing Monitoring and Testing courses, and am looking forward to seeing how people find the updated versions.

To support my continuing interest in DevOps, in the upcoming weeks, you’ll see DevOps-related posts from me.