SQL: Code for errors and be pleased when they don't occur

I spend a lot of time in large organizations that have spent an absolute fortune on highly-available systems, yet when those systems fail over (just as they were designed to do), most of the applications in the building break.

Why?

Because the developers have assumed that nothing ever breaks and have written their code in a far too optimistic manner. Did they do their jobs?

No

Is it possible for their next layer of code to deal with, say, a server disappearing for a few seconds? Of course it is. But it's not going to happen by accident. It's even more important in a cloud-based world.

There was a question about deadlocks again recently on one of our local mailing lists. Can you deal with deadlocks?

Yes

Again though, none of this is automatic. But allowing for (and perhaps even expecting) failure is one of the differences in building enterprise level code rather than toy code.

Image by Ryan Fields

Plan for failure and be pleasantly surprised when it doesn't happen often. But don't plan for perfection or you'll be disappointed.

While it is possible to handle deadlocks within T-SQL code, I prefer to catch them in the next layer of code (let's call it client code here), as there are other types of errors that should be retried at that level anyway.

Applications should have retry logic to cope with things like:

  • Deadlock (error 1205)
  • Snapshot concurrency violations (error 3960)
  • Server disconnection (can be due to network issues, fail-over of HA-based systems, etc.)
  • Various resource issues on the server

It's important to get into the habit of assuming that a transaction that you need to apply to the DB might work, rather than assuming that it will work. Always apply it via logic like:

  • While we haven't applied the transaction to the server, and while the retry time/count hasn't expired, let's try to make it happen.
  • If an error occurs, depending upon the error, we might back off for a while and try again.
  • For things like deadlocks, it's good to have some sort of exponential back-off with a random component.
  • Some errors are pointless to retry (ie: a primary key violation probably isn't ever going to work)

Once you build it that way, things become much more robust and resilient. The user should generally be unaware of these issues, apart from a slight processing delay.

Shortcut: Reset Window Layout in SQL Server Management Studio

One of the problems with applications that have highly-configurable user interfaces (UI) is that users can end up configuring them in ways they hadn't intended, and then don't know how to get back to where they were.

I remember the first time that I was at a session with a presenter from Microsoft showing the (at the time) new personalization options in ASP.NET. You could build a website and let the user determine how the site should be laid out, to suit themselves.

Overall, I can't say that I really like working with websites like that but I can understand the potential appeal. But I can easily see how end users could get really messed up.

I remember asking the presenter if there was a simple button that put the site back the way it was initially developed and removed the user's modifications, so that a user could always just get back to square one.

He told me "ah no, there isn't an option like that".

I'm glad that @sqltoolguy 's team that work on SQL Server Management Studio (SSMS) aren't part of that thinking. While SSMS is very configurable, I have seen people get really messed up with the window management in it. They ended up dragging a window when they meant to drag something else, or did another action that changed their UI and it stuck. Then they don't know how to "fix" it.

In SSMS, there's a wonderful option in the Window menu, that does just what's needed:

Reset Window Layout is the "get me back to where I was" menu item.

SDU Tools: Date of Easter Sunday

One of the endless challenges when working with databases (and not just SQL Server databases) is processing dates and times.

While there are some good date-handling functions in the T-SQL language, there are many more gaps in what's offered. This month, we've filled another one of those gaps.

In Australia, whether staff are religious or not, the Christian Easter is associated with a four-day weekend for most workers except those in retail ie: Friday (Good Friday), Saturday (Holy Saturday), Sunday (Easter Sunday), and Monday (Easter Monday). For many, it looks like this:

Image by Leio McLaren

If you are creating a date dimension in a data warehouse, you might need to know when holidays like this occur. I mentioned in an earlier post that not everyone can tell you the current rules for Leap Years (so we built a tool for that), but even those that can would typically struggle if you asked them how to work out when Easter occurs in each year.

Claus Tøndering has an amazing set of pages called his Calendar FAQ. I highly encourage you to read it. The background provided for how various calendars and dates work is quite amazing. For what we need for this tool though, Claus describes how to calculate the date for Easter Sunday here:

http://www.tondering.dk/claus/cal/easter.php#wheneasterlong

As part of our free SDU Tools for developers and DBAs, we've added an implementation of the calculations that he describes on that page, in a function called DateOfEasterSunday. (If you need to do this in other languages, Claus also has an algorithm section that might help but we just decided to implement the T-SQL from his logic instead).

You can the tool used in the main image above, and can watch it in action here:

For more details on our free SDU Tools, or to register for them, please follow this link:

http://sdutools.sqldownunder.com

 

 

Opinion: You have to live and breathe the technology to be good at it

Digital Transformation and Cloud Transformation are phrases that I hear bandied around at nearly every large organization that I currently doing consulting work for.

Yet, in so many cases, I can't see the organization achieving the changes required. This is for two core reasons:

  • The first is that the culture within the organizations is a major hurdle. There just isn't enough flexibility to think outside the box about alternative ways to work.
  • Worse (and probably more concerning), I see these companies taking advice on how to make these transformations from companies who don't themselves "get it".

An organization that is cloud-antagonistic internally, and stuck in an endless IT management quagmire, isn't likely to make a good cloud transformation, and they're certainly not going to be a successful partner to be able to help you to make a successful cloud migration or to implement a cloud transformation within your company.

An organization that doesn't use business intelligence (BI) or analytics internally isn't going to be able to help you make that transition either.

If the organization is claiming to be proficient in an area of technology, ask them about the use that they are making themselves of those same technologies. As a simple example, ask them about their internal analytics that they can see on their own phones.

To be any good at any of these areas of technology, companies need to live and breathe them daily. If they don't, find someone to help you who does.

SQL: Are big SQL Server databases really slower?

One question that I'm asked all the time when consulting is whether reducing the size of database tables will make queries run faster or not.

The underlying question is typically about whether the client should implement some sort of archiving strategy, to reduce the amount of data in a table by moving older data off into another table.

My answer is that it might help, but if it does, you probably have another issue that would be a better one to solve instead.

When you need to read some data from a table, and you know where the data is, it doesn't matter how much other data is in the table. The only time that the amount of data matters is if you don't know where the data you are looking for is located.

Imagine you have walked into a physical library with a very large number of books. If you look in an index of authors to find the book by your favorite author that you want to read, and it tells you exactly where it is in the library, does it really make a difference if the library is expanded to double the number of books?

Image by J Zamora

You might have to walk a little further, but the process is still much the same, and the time taken will be much the same. But if you didn't have that index of authors, it makes a huge difference, because now you'd have to look through twice as many books to find the one that you need.

Having more data in a table than you need to access is only a problem if you are reading all that data.

The tell-tale signs of this sort of problem are obvious. If you have queries that get slower as the system holds more data, you probably have an issue to solve, and it's likely an indexing problem.

DevOps: Should databases use migration-based or state-based deployment?

One key decision that you need to take when you decide to include databases in your DevOps processes, is how you will store the data model. What flows from that is how you will do deployment.

Migration-Based Deployment

This is still the most common model today and is a very traditional way to work with databases during development.

At some point, you create an initial database, and after that, you keep every script that's needed to bring the schema up to the current point. Those scripts will often include data fixes or new values for reference tables, as well as the DDL required for the schema changes.

Over time, when you are using this model, deployments take longer and longer, as more and more scripts need to be applied when upgrading a database. To get around this problem, I often see teams create new seed databases on a regular basis, so that they aren't ever starting with the very first database.

Most scripts are just created in tools like SQL Server Management Studio, but it's worth noting that Visual Studio 2017 includes Red-Gate's Ready Roll tooling that can help this fit better into a source controlled DevOps based environment. (There is a free version included with VS2017 and a more capable Pro version available as an upgrade. The free version is very good though).

One key problem with this method is that large amounts of time and effort can be wasted when upgrading large databases. Imagine if a developer has had several attempts at designing an index over time, as the requirements for it have changed.

It is not sensible to apply a large index to a database, to then delete it again, to then reapply it slightly differently, to then delete it again, and to then reapply yet another version of it. Yet this is exactly what I see happen on large sites all the time. Columns are added, then removed, then different ones added, and so on, all during a single deployment, as the database moves from update script to update script.

A second problem with this model is that there is no data model that shows a developer (or others) what the database should really look like, apart from a freshly updated database.

A final problem with this model is it assumes that all databases are in one of a known series of states that can be directly upgraded with the same scripts. The problem with this is that schema drift can have occurred. For example, someone needed to patch something on the production server or servers one night, and the changes didn't exactly make it back into the other environments, including the development environment. Or if they made it back, they weren't implemented the same way in those other environments.

The result is that the upgrade scripts break.

State-Based Migration

The alternative to storing a series of scripts is to store the data model instead, and to then use comparison tools to work out what needs to be deployed. There is a lot of benefit to this type of deployment in that the only changes deployed are those that are needed to move from the current state to the required state.

This can be much, much faster.

This is what SQL Server Data Tools tries to do with its database projects. (This was originally Visual Studio Team Edition for Database Professionals – you have to love the marketing people who came up with that name).

Visual Studio has a T-SQL development environment in it now, along with schema and data comparison tools. I have to admit to preferring to use SQL Compare from Red-Gate. It ships in SQL Toolbelt and is simply one of the tools that I'd struggle to work without. It can compare databases with other databases, or even with a set of SQL scripts that make up the database.

We often use their DLM tools (that I'll discuss another day) to extract the scripts for a data model from TFS or Git, and use those as a comparison source when we're doing automated deployment.

Whichever tools you use, the challenge is that you need to maintain the data state while making these changes. And that can be a challenge. We'll discuss that more in later posts.

 

Shortcut: Toggle Full Screen Mode in SQL Server Management Studio (SSMS)

SQL Server Management Studio (SSMS) is a great tool and it has lots of helpful menu items and toolbar items. Unfortunately, all these items take up screen real estate.

You can see that the default screen layout could be considered a bit cluttered if you really just want to focus on the particular query that you're working on.

A keyboard shortcut can help here. Alt-Shift-Enter toggles full screen mode in SSMS.

Note how it gives you much more screen real estate to work with:

And the same shortcut toggles it back.

SDU Tools: Percent Encode a T-SQL String

Percent Encoding (also known as URL Encoding) is a technique for encoding the values in strings that are not able to be directly represented, according to the rules for the target location. For example, not all characters are permitted in URLs.

Percent encoding a T-SQL string is also useful when working with XML. For example, one common method for splitting strings is based on the use of XML PATH but if you try to use those types of string splitters with strings containing, say, percent signs, you'll find they just don't work.

One of our free SDU Tools for SQL Server DBAs and developers is PercentEncode. It can perform the encoding for you. It is based on this Wikipedia PercentEncoding article:

https://en.wikipedia.org/wiki/Percent-encoding

It conforms with RFC 3986 (https://tools.ietf.org/html/rfc3986) and so it encodes all characters apart from the following

  • A to Z
  • a to z
  • 0 to 9
  • hyphen
  • underscore
  • period
  • tilde

You can see an example of it in the image above, and you can see it in action here:

You'll find more information about our free SDU Tools here:

http://sdutools.sqldownunder.com

 

 

Opinion: Vendors who prevent patching should be liable for issues

When many SQL Server customers are asked why they haven't kept up to date with either SQL Server versions, or more importantly, patches to SQL Server, the answers usually boil down to two reasons:

  • They are just slack
  • Their vendors won't support the later version or patch level

Many SQL Server application vendors don't keep up to date with testing of their applications on released versions or patches for SQL Server.

While I can understand a hesitation to quickly support later versions of the product, refusing to support later patches of supported versions is particularly concerning. Worse, actively telling customers to avoid installing security patches is deeply troubling.

Preventing clients from installing security patches is simply not reasonable.

If there is a proven issue with a patch, that's understandable. But if the main reason is that the vendor just hasn't done the work to test the patch, I believe that vendors who do this need to bear liability for any ensuing issues that occur, regardless of their license agreement that might try to exclude consequential damages from use or inability to use their products.

 

SQL: Using UNION when you should use UNION ALL is a performance problem

I spend a lot of time reviewing SQL Server T-SQL code from developers. There are quite a few issues that I see regularly, and one of these relates to UNION.

In SQL Server, a UNION statement is actually a UNION DISTINCT (but we're not allowed to put the word DISTINCT in the syntax). The alternative is a UNION ALL. Often a UNION ALL is what's needed, and using a UNION leads to unnecessary performance problems.

In the WideWorldImporters database, I can write this UNION ALL based statement:

The problem that I commonly see is that developers write this query with UNION instead of UNION ALL. When they do that, SQL Server has to perform a sort (and DISTINCT) operation across the entire returned rowset, sorting by every column.

That's ok if you really need to only return distinct values, but most times I see this done for values that cannot be the same.

In this example, even if the customer and supplier have the same name, the other columns will be different so using UNION instead of UNION ALL is very painful. SQL Server is a fast database engine, but there's probably no need here to have it sorting all the names, URLs, phone and fax numbers, etc. before the query output is produced.

Whenever you are considering a UNION, please consider using UNION ALL if that's what you really mean.