SDU Tools v22 is now available (finally)

One of our popular free resources is the SDU Tools library. If you haven't checked it out, I'd encourage you to do so. It's a large library of functions, procedures, and views all written in native T-SQL code.

You can easily use it as a complete library, or use it as examples of how to write T-SQL code. v22 is now available for download.

If you aren't on our notification list, you can add yourself here:

https://sdutools.sqldownunder.com

I'm sorry it's taken a while longer to get this version out than I would have liked, but we've finally caught up after a really busy period.

In v22, we've added the following:

  • Calculate Age in Months – think it's easy to work out ages? Just use DATEDIFF? If you think so, you'd be mistaken. DATEDIFF could even tell you that someone who is one day old is a month old. That's not true where I live. This function fixes that.
  • Last SQL Server Restart (work out when SQL Server was last restarted)
  • Languages (this is a great list of all the world's languages, categorized into their language families, includes the name of the language in English, and in the native language, and has the two and three character ISO codes for them.
  • List Constraints with System Names (you know you should name constraints explicitly and not let the system do it. This tool helps you find the ones that have slipped through)
  • File Path To File Extension, File Path To File Name, File Path To Folder Path – these functions take a full file path and extract out the extension, the file name, and the folder. They work with local file paths, UNC paths, and with Windows and Unix path delimiters.
  • List Untrusted Check Constraints, List Untrusted Foreign Keys – e previously provided tools to attempt to retrust untrusted check and foreign key constraints, but we realised we did not have a tool to find them in the first place. One new tool is used to list any check constraints that are currently untrusted. The other new tool does the same for untrusted foreign keys.

We've also done the usual updates to lists of SQL Server versions and builds, and a few patches to existing tools.

We hope it's useful.

Azure SQL Database now has an improved STRING_SPLIT !

I get pretty excited about new T-SQL enhancements. Back in 2016, I was so pleased to see Microsoft finally add a string split option to T-SQL, but my enthusiasm was limited when I saw how it was implemented. Now that's mostly fixed !

While it's possible to build functions that did string splitting just like I wanted, the problem is that no matter how you implemented them, they were really just too slow. And most importantly, much slower than a built-in function would be.

The Issues

SQL Server 2016 allowed us to write code like this:

That code would return this output:

STRING_SPLIT is a table-valued function. It took two parameters:

  • The string to be split (Unicode or not)
  • The separator

It returned a set of rows with a single column called value, and it was fast.

So what were the problems?

  • The separator could only be a single character
  • Often you need to trim the values being returned. (That can obviously be done outside this function using TRIM but it'd be nice if there was an option to do it in the function).
  • But the biggest issue of all was that because a table-valued function doesn't have an inbuilt order for returning rows, you didn't know which token was which, that had been extracted from the string.

Now a TVF returns a table, and tables don't have built-in orders either. Nor do views. But what was needed was another column that told you which rows was which, and you could then use that for ordering when needed.

Azure SQL Database

STRING_SPLIT in Azure SQL Database now has an optional parameter to enable an ordinal (i.e. a position).

That extra parameter is a bit or an int that says if an ordinal value should be added to the output.

So this code:

Now returns:

I love working with Azure SQL Database for many reasons, but one of those is that you get new language extensions before other parts of SQL Server. I've been using this feature since the day that I first heard it existed, and it's working great. Now the updated documentation is online as well.

It also means that I can obtain any specific token that I want. For example, get me the third token in the string:

And that returns:

This is a much-needed enhancement.

ADF: Use MSIs not SQL Logins whenever possible

Azure Data Factory (ADF) is great for moving data around. It often needs to connect to SQL databases. If you're creating linked services for this, please try to avoid using SQL logins. Don't use usernames and passwords for authentication if you can avoid them.
 
Azure SQL Database lets you can create a user from a data factory's managed service identity (MSI). MSIs are a special type of Service Principal (SP). They provide an identity where you don't need to manage the ID or any sort of password.  Azure does that for you.
 
Each ADF exposes an Azure-based identity on its property page. Here is the main menu for my BeanPerfection data factory:
Data factory properties page
You might have guessed you should go to the Managed identities menu option, but that is for User Managed identities. I'll write more about them another day. The identity we're interested in is a System Managed Identity.
On this properties page, it's actually called the Managed Identity Application ID. Like all other types of Service Principal, Managed Service Identities have an ApplicationID.

Creating a user in Azure SQL Database

I create a user in an Azure SQL Database with the FROM EXTERNAL PROVIDER option. This says that Azure SQL isn't performing the authentication.

The user name is the full name of the ADF. In Azure SQL Database, we can just use the name of the ADF instead of the identity. It will do that for us.

I always make this data factory name lower-case. Once the user is created, I add it to whatever role in the database makes sense. In my example above, I used the role name datafactory_access but there's nothing special about that name.

You need to decide which role to add it to based upon the tasks that the data factory needs to perform. While it's temptingly easy to just add it to db_owner, try to resist that. If in doubt, create a role that's used for data factory access and grant it permissions as they are needed.

Must execute using an AAD-Based Connection

If you just try to execute the statements above, you might find that you get an error message saying:

Msg 33159, Level 16, State 1, Line 8
Principal 'beanperfectiondatafactory' could not be created. Only connections established with Active Directory accounts can create other Active Directory users.

You cannot create any type of Azure AD-based user in an Azure SQL Database, if your connection was authenticated as a SQL login. You must use a connection that was itself made using Azure-AD authentication.

I find that the easiest way to do that, is to make sure I have an Azure Activity Directory Admin assigned for my Azure SQL Server, and then just execute the code right in the Azure Portal. I use the Query Editor tab in the main menu for the Azure SQL Database and connect as that administrator.

 

SQL: Fix: The parameters supplied for the procedure "sp_set_firewall_rule" are not valid.

I often use the procedure sp_set_firewall_rule to set firewall rules for Azure SQL Server. (There's a similar call to set the firewall for databases). The other day though, I got an error that had me puzzled:

I also tried it with named parameters and got the same error.

When I looked at my previous scripts, I realised that I had used a Unicode string for the first parameter previously.

Solution

I changed ‘TestRule’ to N’TestRule’ and it worked fine.

I’ve not seen a procedure before that wants a Unicode string that won’t accept an ASCII string. For example, this works just fine:

Apparently strict data type checking has always been a feature of extended stored procedures and this one checks specifically for a Unicode string. I really don't use extended stored procedures much anyway.

What threw me as well is that I couldn't find it in the list of system stored procedures. It's not there because it's actually an extended stored procedure. These used to mostly have xp prefixes, not sp prefixes, and are a good example of why I don't love using prefixes like that.

I really wish though, that this procedure had a better error message. While the current one is strictly correct, it is not actually all that helpful.

 

New Online Course Released: Advanced T-SQL for Developers and DBAs

I'm really pleased to let you know that our latest online on-demand course is now released:
 
 
To celebrate the release, you can get 25% off the pricing until Aug 14th by using coupon code ATSRELEASE
 
We've had so many requests from customers to bring this course to our online platform. It was always one of our most popular in-person courses, and it's now released and fully updated.
 
The course includes the instruction plus quizzes and the same hands-on labs that we use in the in-person courses.
We've made a big effort with this course to make it really easy for you to do the labs. The labs only require you to have a fairly recent version of  SQL Server Management Studio (SSMS) installed to complete them. You don't need to install anything else. We've provided the required databases online ready for you to connect to.
 
Not on the latest version of SQL Server? Not a problem either. Unlike most other courses, our courses always cover at least all the supported versions of SQL Server and show you what's changed between versions.
 
I hope you enjoy it.

Reliably dropping a SQL Server database if it exists

I often need to write scripts that drop and recreate databases. The hard part of that has always been reliably dropping a database if it already exists. And no, you wouldn't think that would be hard, but it is.

Built in Command

T-SQL has a built-in command for this.

You'd hope that would work, but it doesn't.  I wish it did. The problem is that it will fail if anyone is connected to the DB. And to check if anyone is attached, you first need to check if the DB exists, so it makes the whole "IF EXISTS" part that was added to this command, completely pointless.

Worse, if you have separate code to kick everyone off first, you always have a chance of a race condition, between when you kick everyone off, and when you execute the command.

Nearly OK

Years back, the Microsoft docs library said to drop a database like this:

This was promising, but unfortunately, it has an issue as well. Because you were in the master database when you issued the ALTER, you don't know that you are the single user. So, periodically, that would fail too.

Best Workaround

Over the last few days, we've had a discussion on an MVP list about how to work around this. Many thanks to Paul White, Erland Sommarskog, and Simon Sabin for contributing to it.

The best outcome I have right now is to use this:

To get the DROP to work properly, you need to execute the ALTER DATABASE from within the target database. That way, you end up being the single user, and even though you then execute a change to master, you hold the required session lock on the DB, and then the drop works as expected.

Because you can't have a USE Sales in the script if the Sales DB doesn't exist, this unfortunately has to be done in dynamic SQL code, where it is only executed if the DB does exit.

The last change to tempdb is just protection, if I have a script that then wants to create the DB and change to using it. If that goes wrong, I want to end up creating things in tempdb, not somewhere else like master.

What I wanted

What I've been asking for, and for a very long time, is this:

The ROLLBACK IMMEDIATE needs to be on the DROP DATABASE command, not on a separate ALTER command. Hopefully one day we'll get this.

SQL Interview: #14: Set operations using EXCEPT

This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.

Section: Development
Level: Medium

Question:

UNION and UNION ALL are commonly used to combine two sets of rows into a single set of rows.

EXCEPT is another set operator.

Can you explain what it does?

Answer:

EXCEPT is used to remove any rows in the first set of rows, if the same rows appear in the second set.

For example, in the code below:

The query returns all the Trading Names for customers unless a supplier also has that same name.

In other database engines (e.g. Oracle), this operator is called MINUS.

Note: For EXCEPT to work, the queries must have the same number of columns, and compatible data types. The values must also be "comparable". For example, with strings, that means compatible collations, and it means that only comparable data types are supported i.e. spatial data types would not be, as they are not "comparable".

SQL: Password complexity rules for Azure SQL

Azure SQL (both Azure SQL Database and Azure SQL Managed Instance) both have different password complexity rules to SQL Server. I was reading an email discussion list and a poster asked where he could find the list of password complexity rules for Azure SQL. I said I'd never seen a list.

Well it turns out that there is a list, but not where you might have thought to look. They're spelled out in this article:

Identify the right Azure SQL Database SKU for your on-premises database (Data Migration Assistant) – SQL Server | Microsoft Docs

To avoid you reading the whole article, at the time of writing, these were the rules for Azure SQL Database:

  • Your password must be at least 8 characters in length and no more than 128 characters in length.
  • Your password must contain characters from three of the following categories – English uppercase letters, English lowercase letters, numbers (0-9), and non-alphanumeric characters (!, $, #, %, etc.).
  • Your password cannot contain all or part of the login name. (Part of a login name is defined as three or more consecutive alphanumeric characters.)

Slightly more confusing is that the article says it's talking about the server admin password, but it also appears that this same list of restrictions applies to all logon/user passwords as well.

Azure SQL Managed Instance is exactly the same except it requires the password to be at least 16 characters in length.

I hope that helps someone, including myself when I next go looking for this.

 

SQL Interview: #13: Impact of optimize for adhoc workloads

This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.

Section: Server Tuning
Level: Advanced

Question:

SQL Server 2008 introduced a server option for optimize for adhoc workloads.

Can you explain the problem that it is designed to solve, what most commonly causes the problem, and what impact this option has when you enable it?

Answer:

On a SQL Server, you can end up with a large amount of memory being taken up by query plans that will likely never be used again.

There are two primary causes of this situation:

  • The less common cause is that you have a large number of adhoc queries being executed as one-off queries.
  • The most common cause is that you are using a framework like LINQ that does not handle data typing properly, and causes a large number of query plans to be created for each query. You might also be using lower-level frameworks like ADO.NET incorrectly.

As an example, command objects in ADO.NET have a parameters collection. If you add parameters using the AddWithValue() method, you specify the parameter name, and the value, but you do not specify the data type. The problem with this is that the framework then tries to work out what the data type is, from the value.

This means that if you pass a string like 'Hello', then it might guess nvarchar(5) but if you pass a string like 'Hello There', it might guess nvarchar(11).

The lengths of data types are part of the signature for each query plan. This means that you can easily end up with different query plans for every combination of every length of string that has ever been passed to the query. We call this Plan Cache Pollution.

The correct way to fix this is to avoid ever using methods like AddWithValue() and instead using a method where you specify the data type. The challenge here is that many frameworks like LINQ have this behaviour baked in, and the developer cannot change it.

When the optimize for adhoc workloads option has been set, the first time a query is seen, the hash for the query is stored (so it can be remembered) but the query plan is not stored. The next time the same query is seen, the compiled plan is then stored for reuse.

This avoids the plan cache becoming littered with one-off query plans.

Another option taken by some frameworks, is to just use varchar(4000) or nvarchar(4000), or worse nvarchar(max) for all strings. That will of course avoid the plan cache pollution, but it's likely to then cause issues with memory grants in poor quality plans.

 

T-SQL 101: #96 Choosing from alternatives with IIF in SQL Server T-SQL

In my last T-SQL 101 post, I described the CASE statement. Until SQL Server 2012, that was the only real option that we had for choosing between alternate values. In SQL Server 2012, Microsoft gave us another option with the IIF function.

The IF function is very similar to the IF function in Microsoft Excel. It takes three parameters:

  • A boolean value to check (normally this is an expression)
  • A value that will be returned if the first parameter is true.
  • A value that will be returned if the first parameter is false.

In the main image above, you can see the equivalent CASE statement and the simplified IF function equivalent.

Nesting

Again similar to how IF works in Excel, you can nest IF functions. For example, you can write:

In this case, we're saying that if the Size is Large, then 12 will be returned. But otherwise, if the Width is 13, then 0 will be returned. In all other cases, -12 will be returned.

Returned Data Type

The function looks at the data type of the second and third parameters (i.e. the values returned for true and false), and chooses the data type with the highest precedence. For example, if the second parameter is an int, and the third parameter is a bigint, then a bigint will be returned no matter which value is chosen.

Learning T-SQL

It's worth your while becoming proficient in SQL. If you'd like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.