Everyone who's worked with SQL Server for any length of time, has had the experience of executing a T-SQL script, and then noticing, with horror, that they've just executed the script against the wrong server.
You know the feeling. It even happens at Christmas time, just when you were hoping to get away from work for a few days, or when you are the unlucky one who's doing on call work.
Many of these surprises would be avoided if there was something that gave you a visual clue that you were connected to the wrong server.
SQL Server Management Studio (SSMS) has had an option to color code connections to servers for quite a while. The solution isn't perfect and isn't as good as Mladen Prajdić's SSMS Tools Pack:
When you first start working with SQL Server, you're often working with smaller databases and every backup seems to happen quickly. SQL Server is fast at creating backups.
When you get to larger databases, backups start to take a while. Fortunately, backup is one of the commands that sets a percentage complete value that can be seen in the sys.dm_exec_requests system view.
And when the databases get even larger, the question quickly becomes:
When will my backup finish?
The good news is that we've built a tool to make that easy to work out.
One of the SDU Tools is ShowBackupCompletionEstimates. It shows you any currently executing backups, how complete they are, when they started, how long they've taken so far (in seconds), and when we estimate that it will complete. Because it can cope with more than one backup running at a time, we also added the last SQL statement that was executed in the session, to make it easier to work out which is which.
Here's an example of its output (wrapped for clarity):
You can see it action here:
For more information on our free SDU Tools, please visit this site:
I work with a lot of SQL Server databases that are poorly normalized. One of my pet dislikes is the column to rule them all.
Here are simple tests:
If I ask you what's stored in a column and you can't tell me a single answer , then you've got a problem.
If you need to refer to another column to work out what's in the first column, then you've got a problem.
Here are some examples:
If you have a column (let's call it ObjectID) that sometimes holds a TeamMemberID, sometimes it's a CoachID, sometimes it's a TeamID, etc. then you have a design problem.
If you must refer to another column (let's call is ObjectType), to work out what's in the ObjectID column, then you have a design problem.
Instead of a combination of ObjectType and ObjectID, I'd rather see you have a TeamMemberID column that's nullable, a CoachID column that's nullable, a TeamID column that's nullable, etc. And at least there's a chance that you could one day even have foreign keys in the database, and some chance of integrity. (But that's a topic for another day).
One of the strangest reasons that I've heard for this was to "try to minimize the number of columns in the table". Please don't say that. No sensible person is going to ever exceed the limits.
Prior to SQL Server 2008, the limit for the number of columns per table was 1024.
It's hard to imagine what you'd use more than that for, but the SharePoint team asked to have that increased. Apparently, 10,000 columns wasn't enough, so we ended up with 30,000 columns per table now. I struggle to think about what type of design leads to that many columns but it's also why we got SPARSE columns and filtered indexes in that version. (Mind you, filtered indexes were a great addition to the product on their own, unrelated to SPARSE columns). Let's just leave that reason as "oh, SharePoint". Can't say I love their database design, at all.
But for the rest of us, limiting the number of columns in a table isn't a valid reason for messing up normalization, particularly when those columns are keys from other tables.
It's hard to believe that we're back to Christmas time. I just wanted to take a moment to thank all those who've read my blog during the year and look forward to great interactions again next year.
It's been a big year for us. I've moved to a new blog, moved to a new website, and moved to a new house. All have been "entertaining" but I'm happy with the outcome in each case.
I hope you all have a happy Christmas period, no matter what your belief system is. For most it's a holiday period regardless.
If you are travelling, take care and travel safely. We had another tragedy in Melbourne yesterday, and it just shows how fleeting life can be.
圣诞快乐 to my Chinese readers too. I've made quite a bit of progress on my Chinese this year. The most interesting (yet sad) thing I did was to MC a memorial service for a friend's father, in both English and Mandarin. That was quite a challenge. Many of the things that I had to say were in very formal language. But I think I got through it OK. I'd never spoken to a crowd of people in Mandarin before, and particularly not a crowd of Chinese speakers. I just hope I didn't mess it up too much.
Over the years, I've had a surprising number of questions on how to right-align a set of numbers using T-SQL.
The first concept that people seem to miss is that numbers aren't strings. So there's no concept of actually aligning a number, it's only a string representation of a number that can be right-aligned or left-aligned. Or how it's displayed in a client application.
But if you really want to create a string that has right-aligned numbers, then left padding of the number is what you want.
One of our free SDU Tools is LeftPad. (We also have RightPad).
Here's an example:
I've set NOCOUNT on first (just to clean up the output by removing rowcounts).
I've then called LeftPad using the word Hello (the value to be padded), then 10 as the total width, and an o as the padding character.
In the second example, I've padded the value 18 with zeroes.
And in the 3rd and 4th examples, I've right-aligned some decimal values.
As I mentioned earlier, there's also a RightPad function, which can be useful for creating fixed width output strings.
You can see them both in action here:
For more information on our free SDU Tools, visit here:
In T-SQL, a script is a set of one or more batches.
For example, if we have the following script and click Execute, it looks like all the commands were sent to the server and executed all at once:
But that isn't what happened.
What did happen is that SQL Server Management Studio (SSMS) found the word GO and broke the script into a series of batches. In this case, there were three batches. First, it sent the commands shown here as Batch 1 to the server, waited for them to execute, then sent Batch 2, waited for it to execute, then sent Batch 3, and waited for it to execute.
It looks like it just happened all at once, but it didn't.
The key thing to understand about GO is that it's not a T-SQL command. It's a batch separator. We use GO to separate batches of commands that are part of a single script. The word GO is never sent to SQL Server. It only has meaning to the client tool (ie in this case SSMS).
In fact, you can change it. One of my friends that's a Star Trek fan has all his scripts with ENGAGE rather than GO.
You can see that I've changed it in Tools -> Options :
And now if I open a new query window, that works just fine:
I can't recommend doing that as your scripts won't be much use to anyone else.
So keep in mind that it's the client that understands GO, not the server. And that leads us to our shortcut of the day.
You can add a count after the word GO, and then SSMS will send your commands from that batch to the server more than once:
Notice that Intellisense in SSMS doesn't understand the syntax (there is a red squiggle) but it works just fine.
I find this shortcut quite useful if I want to execute a command a substantial number of times. An example is when I want to insert a bunch of default rows into a table; another is for executing commands while testing.
I’ve written before about how important it is to find code smells.
One of these relates to foreign keys. I’ve argued in previous posts about the importance of foreign keys but if you do have them in place, you need to index them.
When SQL Server creates a primary key, it creates an index under the covers to support the primary key. The index has the same name as the key. (And is one of the reasons why you should name your primary keys and not let the system do it for you). A primary key needs to be both unique and NOT NULL. So SQL Server creates an index so it can quickly check if a value already exists.
So if I have a Sales.Customers table with a CustomerID as the primary key, there will be an index on the CustomerID column.
SQL Server also creates an index to support unique constraints. Again, this is because it needs to be able to check whether the value already exists.
Foreign keys are where the problems can occur because SQL Server does not automatically create an index for these. As an example, if I have a Sales.Orders table and it has a CustomerID column declared as a foreign key, when I’m inserting or updating an order, I don’t need an index on the CustomerID column in the Sales.Orders table. I’m using the primary key index on CustomerID in the Sales.Customers table to perform the lookup.
The problem with this logic is:
What is the chance that you’ll want to find all the orders for a particular customer? (pretty high)
If you want to delete a customer, what happens?
For the first issue, you certainly want to have an index. The problem is that an index on just the CustomerID column might well not be a great index for that query. It’s likely you want other key columns or included columns in the index. For example, you might want the date of the order.
For the second issue of deletes though, things can get very nasty. Deleting a single row from Sales.Customers requires a full scan of the entire Sales.Orders table to make sure the customer doesn’t have any orders.
I really wish that SQL Server automatically indexed foreign keys unless you used some sort of I_KNOW_WHAT_IM_DOING option to turn it off ie: tell SQL Server that you'll deal with it.
A huge number of performance issues would be solved by just doing that.
So where does the tool fit it?
One of the code smells that we go looking for, is declared foreign keys where the foreign key columns aren’t the left-most columns of at least one non-clustered index.
(I know that might take a moment to sink in).
As an example, with our table from before, if in the Sales.Orders table I declare CustomerID as a foreign key to the Sales.Customers.CustomerID column, I’d better find at least one nonclustered index that has CustomerID as the first column. I don’t care if the index is on CustomerID and OrderDate, as long as CustomerID is the first key in the index.
One of our SDU Tools is ListNonIndexedForeignKeys. It’s a stored procedure that takes 3 parameters. The first parameter is the database name. The second parameter is a list of schemas or the word ALL (which is the default). The third parameter is a list of tables or the word ALL (which again in the default).
So to just check a whole database, you can just execute:
This example checked the whole WideWorldImporters database. When I was designing that database, I made a conscience decision to not index the LastEditedBy column. You can see that the tool flags that.
You can see the tool in action here:
For more information on our free SDU Tools, look here:
Intrusions into computer systems are happening all the time now. We need to address this issue as an industry, but it’s important to understand that the way we design databases plays a big role in the impacts that occur during intrusions.
A bug in any one of the frameworks that you use, the code that you write, the protocols that you use, the operating system or hosting services that you use can potentially expose you to an intrusion.
So do we just give up?
No, what you need to ensure is that when an intrusion occurs, the damage or impact is minimized. We do this in all other industries. For example, people working in high locations don’t expect to fall but they (generally) make sure that if they do, while something nasty might happen, it won’t be disastrous.
I routinely see web applications and middleware that can access any part of a database that it wants. The developers love this as it’s easy to do. But it exposes you to major risks. If the application is trampled on, you’ve opened up everything.
I always want to put mitigation in place and to limit the damage.
If your plan is to have your application connect to the database as one user, and you make that user a database owner (db0), or a combination of db_datareader and db_datawriter, or worse, a system administrator; then you don't have a plan.
A better plan is this:
Create a schema for the application – let’s call it WebApp
In the WebApp schema, create only the views and procedures that define what you want the application to be able to do (ie: it’s basically a contract between the database and the application)
Create a new user (from a SQL login or, better-still, a domain service account) for the application to connect through.
Grant that user EXECUTE and SELECT permission on the WebApp schema (and nothing else)
Then if the application is trampled on, the most that it can do is the list of things that you’ve defined in that schema and nothing else.
We need to start building systems more defensively, and this is reason #82938429 for why I just don’t like most ORMs as they tend to encourage entirely the wrong behavior in this area. (Some let you do it better begrudgingly).
There were a number of new T-SQL functions introduced in SQL Server 2012. As I've mentioned before, I get excited when there are new T-SQL functions.
Some, I'm not so excited about. EOMONTH was in that category, not because of the functionality, but because of the name (wish it was ENDOFMONTH), and lack of symmetry (lack of a STARTOFMONTH or BEGINNINGOFMONTH).
One that I thought was curious was CONCAT. I thought "why on earth do we need a function to concatenate strings. I can already do that. But when I got into using it, I realized how wonderful it was.
The problem with concatenating values is that you first need to convert them to strings. CONCAT does that automatically with all parameters that you pass to it. (It takes a list of values as parameters and you must have at least two parameters). You can see here how I can mix data types in the parameter list:
All the values are implicitly cast to a string. OK, so that's a little bit useful, but still no big deal?
The really powerful aspect is that it ignores NULL parameters in the list. (The documentation says that it implicitly casts them to an empty string but based on discussions I've had with the product group lately, my guess is that it simply ignores any parameter that's NULL).
Now that's something that's much messier with normal T-SQL. The problem is that when you concatenate anything that's NULL with the + operator in T-SQL, the answer is NULL, no matter what the other values are:
But this handles it nicely:
But notice that we're still not quite there. It's a pain to need to specify the separator each time (I've used N' ' as a single unicode space). More of a pain though, is notice that I still have two separators between Tim and Taylor in the example above.
CONCAT_WS in SQL Server 2017 comes to the rescue for that. It lets you specify the separator once, and ignores NULLs in the parameter list, and importantly, doesn't add the separator when the value is NULL. That's a pain if you want to use the function to create a string like a CSV as output (and I wish it had been a parameter to this function) but for this use case, it's perfect.