SDU Tools: Percent Encode a T-SQL String

Percent Encoding (also known as URL Encoding) is a technique for encoding the values in strings that are not able to be directly represented, according to the rules for the target location. For example, not all characters are permitted in URLs.

Percent encoding a T-SQL string is also useful when working with XML. For example, one common method for splitting strings is based on the use of XML PATH but if you try to use those types of string splitters with strings containing, say, percent signs, you'll find they just don't work.

One of our free SDU Tools for SQL Server DBAs and developers is PercentEncode. It can perform the encoding for you. It is based on this Wikipedia PercentEncoding article:

https://en.wikipedia.org/wiki/Percent-encoding

It conforms with RFC 3986 (https://tools.ietf.org/html/rfc3986) and so it encodes all characters apart from the following

  • A to Z
  • a to z
  • 0 to 9
  • hyphen
  • underscore
  • period
  • tilde

You can see an example of it in the image above, and you can see it in action here:

You'll find more information about our free SDU Tools here:

http://sdutools.sqldownunder.com

 

 

Opinion: Vendors who prevent patching should be liable for issues

When many SQL Server customers are asked why they haven't kept up to date with either SQL Server versions, or more importantly, patches to SQL Server, the answers usually boil down to two reasons:

  • They are just slack
  • Their vendors won't support the later version or patch level

Many SQL Server application vendors don't keep up to date with testing of their applications on released versions or patches for SQL Server.

While I can understand a hesitation to quickly support later versions of the product, refusing to support later patches of supported versions is particularly concerning. Worse, actively telling customers to avoid installing security patches is deeply troubling.

Preventing clients from installing security patches is simply not reasonable.

If there is a proven issue with a patch, that's understandable. But if the main reason is that the vendor just hasn't done the work to test the patch, I believe that vendors who do this need to bear liability for any ensuing issues that occur, regardless of their license agreement that might try to exclude consequential damages from use or inability to use their products.

 

SQL: Using UNION when you should use UNION ALL is a performance problem

I spend a lot of time reviewing SQL Server T-SQL code from developers. There are quite a few issues that I see regularly, and one of these relates to UNION.

In SQL Server, a UNION statement is actually a UNION DISTINCT (but we're not allowed to put the word DISTINCT in the syntax). The alternative is a UNION ALL. Often a UNION ALL is what's needed, and using a UNION leads to unnecessary performance problems.

In the WideWorldImporters database, I can write this UNION ALL based statement:

The problem that I commonly see is that developers write this query with UNION instead of UNION ALL. When they do that, SQL Server has to perform a sort (and DISTINCT) operation across the entire returned rowset, sorting by every column.

That's ok if you really need to only return distinct values, but most times I see this done for values that cannot be the same.

In this example, even if the customer and supplier have the same name, the other columns will be different so using UNION instead of UNION ALL is very painful. SQL Server is a fast database engine, but there's probably no need here to have it sorting all the names, URLs, phone and fax numbers, etc. before the query output is produced.

Whenever you are considering a UNION, please consider using UNION ALL if that's what you really mean.

DevOps: Why Don't Database Developers Use Source Control?

I keep doing work at sites where none of the database code is stored in version control (source control) systems. I keep wondering why that is.

At a recent site, all the source code was in individual files just sitting in a single folder. That's just not sensible.

I'm left wondering why it is that almost every team that I see working with higher-level languages just assumes that some form of source control would be used, yet it's almost the opposite when I'm working with data teams.

Having decent source control makes such a difference:

  • No more overwriting changes and losing them.
  • No more wondering what changed between versions, or who changed them.
  • And so on and so on.

There seems to have never been a culture of source control among DBAs; and database developers are somewhere in between these two worlds.

One aspect of this is tooling.

Vendors like Red-Gate do a reasonable job with their source control offerings for T-SQL but some clients want a "pure-Microsoft" solution for some reason.

In earlier versions of SQL Server Management Studio (SSMS), there was support for an SCCI (Source Code Control Interface) provider add-on. That would let you connect SQL Server script projects to source control. Sadly, that disappeared in recent versions of SSMS. I gather that there might be a way to attach the Visual Studio Team Explorer to it but I haven't pursued that and I really hope that a standard interface will return soon. I feel that SSMS should interface directly with both TFS and Git as part of a default install. Having tools like this without source code interfaces built in, helps to push an inappropriate direction.

If however, you are using SQL Server Database Tools (SSDT) to build your databases, then both TFS and Git are standard connections from Team Explorer.

I just find that I can't do my database development work directly in SSDT. I find very few people do that. Most use SSMS for development.

I'd love to hear others' thoughts on why this culture has evolved this way, and how to change it.

Shortcut: When did my T-SQL query finish?

It's likely that everyone who uses SQL Server Management Studio (SSMS) knows how to tell how long a query ran for. You can see it in the bottom right of the status bar when a query finishes.

But one question that often comes up with a long-running query is when did my query finish?

That's not in the status bar and many people don't seem to be aware that you can find it out.

It's part of the data in the Properties window. So when you come to a query window where the query has finished, and you're wondering when it finished, hit F4 (or right-click in the window and click Properties), and you'll see this info shown:

SDU Tools: Trim Whitespace in T-SQL

Today's post is about one of our free SDU Tools that helps you with what you'd think would be a simple task in T-SQL but isn't.

TrimWhitespace is used to remove leading and trailing whitespace characters in T-SQL strings. As well as spaces, it will remove carriage returns, linefeeds, and tabs.

You can see its action in the image above.

You can also see it in action here:

For more information on our free SDU Tools for SQL Server developers and DBAs, follow this link:

http://sdutools.sqldownunder.com

 

Opinion: Case Sensitivity is a Pox on Computing

Case sensitivity in comparisons is an aspect of computing that I’m surprised is still so widespread. I can’t say it more clearly than this:

It’s a pox on computing and needs to be eradicated.

(https://www.phrases.org.uk/bulletin_board/50/messages/583.html)

I’ve recently been working at a site at present where a new case-sensitive SQL Server system is being implemented. I cannot begin to describe what a poor idea I think this is.

In the end, all that a case sensitive system allows you to do is to have:

  • Multiple identifiers exist
  • They exist in the same scope
  • The names of the identifiers differ only by case

You’d have a hard time convincing me that that would ever be a good idea.

At least not for a system used by typical humans. No sensible person is ever going to be comfortable with “John Smith” being a different name to “john smith”. And similarly, do you really want a single database table with a CustomerID column, a CustomerId column, and a customerID column?

Well I certainly don’t.

In the new system that I mentioned, there are columns ending in ID, Id, etc. They haven’t even been consistent in the naming of their case-sensitive objects.

And yes, I hear the “but what about private variables vs properties in languages like C#?” complaint:

  • Age is the object’s property
  • age is where the object stores the value assigned to the property

Surely we can come up with a better naming convention than that. I’ve lost count of how many times I’ve seen people using the property when they meant the private variable or vice versa. It’s just not sensible.

Now before I hear complaints that case matters, be clear that I’m not talking about case preservation; that’s an entirely different thing. Yes, if I defined a column as CustomerName, I don’t care if I query it by customername, Customername, etc. I want it coming back as CustomerName ie: however I defined it. Case preservation is a virtue; it’s case sensitivity that I see as an almost always painful and unnecessary thing.

Worse, if you’ve ever tested an application against a case-sensitive server, you’ll understand the challenges involved. It’s hard to get case-sensitive code correct.

I gather that SQL Server Management Studio has a current bug that arises when you remove and re-add a database from an availability group on a case-sensitive server. Why? It appears that they aliased a table with A in one place, and used the alias as a in another place. It’s really nonsense to have a situation where that matters but it highlights the other big issue. It makes for fragile applications.

Image by Michał Parzuchowski

Do you really want to be the one who’s testing all your applications and 3rd party utilities to find out if they’ve tested case-sensitivity properly? Do the tools that you use have a better testing regime than SSMS? I’ll bet that most don’t. And what that means is that you get to spend your life wading through obscure tooling issues.

No thanks. Life is too short.

SQL: Do I Still Need to Run DBCC CHECKDB?

In short: YES

(In contradiction to Betteridge's Law of Headlines)

Every now and then, customers ask me if they really need to run DBCC CHECKDB. There was even a question that came up about this on a private mailing list full of people who really should already understand why. There is no mystery here.

DBCC CHECKDB checks the database for physical readability (are the pages intact and can they be read from the I/O system). This makes people wonder that if the I/O subsystem is already doing this, why does SQL Server need to do this?

Importantly, DBCC CHECKDB checks for logical consistency as well.

That’s incredibly important.

SQL Server is a fabulous database but it has no way of coping with internal structures that are messed up. For example, links on pages to other pages, but now pointing to the wrong pages.

Fortunately, because SQL Server is such a good database, having to deal with logical corruption caused by the database engine messing up is a truly rare event. That could happen though if there was a serious bug in SQL Server.

But even if SQL Server is perfect, it could be led astray. A good example would be a multi-path I/O system. Imagine this scenario:

  • The database has two paths to the I/O subsystem (multi-path I/O)
  • It writes a page out one path
  • It reads the same page from the other path
  • The I/O subsystem returns an older version of the page from its cache, not the one that was just written

That sort of thing could easily lead to internal corruption of the database. So could issues with your memory subsystem.

Fortunately, the SQL Server team again have your back. It is designed to help to detect some of these issues too. It probably can’t fix them but it might help to find them. For example, in the multi-path I/O problem mentioned above, SQL Server keeps a list of checksums of recently-written pages. If it reads one of those pages and it gets a page with a valid checksum but a different checksum to the one that it recently wrote, it lets you know that something went very wrong.

Even cleverness like this though isn’t going to help if the culprit is the storage subsystem having a cache bug, and so on, and so on.

Finally, DBCC CHECKDB is so important because it lets you find issues quickly. Otherwise, the logical problem would get much, much worse and be difficult, if not impossible to correct. You don’t want to be the DBA who has a large number of backups but every one of them has the same corrupted data.

DevOps: Infrastructure as Code – What about code quality and management?

For many years now, it has been important to script the configuration and deployment of systems, particularly virtual machines. Infrastructure as Code is now a common requirement but as the required configuration has become more complex, scripting in language like PowerShell has become more difficult.

It’s all very well to write code to add say a network adapter, but how do you check the current state of the machine?

  • Did that adapter already exist?
  • Is something else using the IP address?
  • How do you write a script to a cater for all the situations?

This leads to ever-more complex code and this is where my concerns start. Writing code for creating infrastructure needs the same discipline that writing any other code does. This includes code quality, coding conventions, error handling, source code control and versioning. Yet, who is writing this code?

Image by Wes Hicks

What I commonly see is that someone who’s come from a network admin background or similar, and who was able to hack together a few batch jobs successfully, suddenly is creating complex script for infrastructure creation. It often reminds me of cable infrastructure in 3rd world countries that sort-of gets the job done, most of the time.

And that’s just more than a bit scary for organizations that depend upon that code.

Image by Janko Ferlič

Regularly I see PowerShell scripts that wouldn’t pass newbie developer 101 tests. They are potentially highly unreliable. I think that organizations need to become much more involved in the management of this type of code, and start to exert appropriate controls on how it’s developed.

Organizations that depend on code the way it's often currently written, are likely to find things going very wrong, just when they can least afford them to do so.

Image by Sergi Viladesau

One shining light in this area is the appearance of better and better configuration management tools.. Rather than telling the operating system how to configure something step by step (ie: imperatively), you create a template of how you’d like the machine to be configured (ie: declaratively) and tell the system to “make itself look like that”.

For PowerShell, that’s Desired State Configuration (DSC).

DSC takes a JSON-based template that describes the outcome, not the process to achieve the outcome. And a bonus is that because it’s outcome driven, it’s also idempotent ie: you can run it again and again and still get the same outcome.

Image from MSDN

If you haven’t explored DSC, I’d encourage you to do so. Either way, it’s time to start to take control over the management of your infrastructure-related code.

If you'd like to learn more about Infrastructure as Code in general, Microsoft have published this course as part of the Microsoft Professional Program for DevOps. You can take the course for free (more details here: https://www.edx.org/course/infrastructure-code-microsoft-devops200-2x-0) , or pay to take it if you want to pursue the certification.

Shortcut: Using the Clipboard Ring in SSMS

Two key combinations used by SQL Server T-SQL developers every day are Ctrl-C and Ctrl-V for copy and paste.

But many users of SQL Server Management Studio (SSMS) don’t realize that it has a clipboard ring and can deal with several objects in the clipboard at the same time.

Let’s see an example.

In this screen shot, I’ve opened a query window with the source code of the AnalyzeTableColumns procedure from SDU Tools.

I might want to work with the parameters for that procedure, so I double-click and hit Ctrl-C for each of the parameter names.

I’ve then opened a new query window where I want to work. If I hit Ctrl-V, I just get the last value that I copied, as expected:

However, instead of using Ctrl-V, if I use Ctrl-Shift-V, I see the same value, but if I continue to hit Ctrl-Shift-V, I see the previous clipboard entries, one at a time. I can then use one or more of the other parameter values that I copied before:

This is one of those things that once you get used to it, you’ll wonder how you ever worked without it.