SQL: How do I choose a clustering key in SQL Server?

Clustered indexes vs heaps

OK so you know that a table can either be a heap or it can have a clustered index.  The first question of course, is which should I use?

If you have any doubt at all, put a clustered index on the table. Most SQL server tables will end up performing better that way. There are a few scenarios like log files where heaps will be preferable but if you're not sure, start with a clustered index.

But which column or columns should I cluster on?

That then raises the question of what sort of keys or which columns are really the best to choose for that clustering index, and of course, like many things in SQL server, the answer is that "it depends on how it's being used".

For example, if you're inserting data, and you're looking at insert performance in the table, the choices you make will affect this. For this operation, I'd start by saying that the value needs to be static, because if the value changes, then that means the row has to move. That's almost always a bad thing. Because the rows are kept in a logical sequence, you do not want to use a value that ends up changing, at all.

The second thing is, you want a short value. This value is going to be all throughout the index. Every single thing in the table is going to be sorted by that, and it's the value that's going to be present in the leaf level of every nonclustered index as well. You don't want long values here.

Large key values like long strings, will not perform well for this.

You also want the values to be unique. If SQL server doesn't know that your values are unique, it will automatically add some additional data that we call a uniqueifier to make sure that it can uniquely identify, every single row. Your world will be better if it knows that they are actually unique.

Finally, it might be helpful if the values are increasing. It doesn't have to be what's called monotonically increasing (that is going up by the same amount each time), but just always getting bigger. What that means is that data will just keep getting added to the end of the table. Most of the time that's going to be helpful.

There's always an "it depends"

There are some very high performance scenarios where the end of the table could then start to become a hotspot and can be a problem because all the inserts are happening there. But for the vast majority of tables you'll be better off having the data that gets added to the table, being added to the end of the table and an increasing clustering key will do that for you.

Learn about indexing

Want to learn more about indexing? Take our online on-demand course now:






Learning Mandarin: Tones used in Mandarin

I mentioned in an earlier post that Chinese dialects are tonal. As well as words being different in different dialects (but often still the same characters), the tones are also different for different dialects. Mandarin is generally considered to have four tones or  声调 (Shēngdiào) plus a neutral tone. So, some would describe that as a total of five tones.

The first tone or 第一声 (Dì yī shēng) is often drawn as level but it's actually both high and level.

An example is  (Mā) meaning mother. The second tone or 第二声 (Dì èr shēng) is a rising sound.

An example is  (Má) which is a word for hemp. The third tone or 第三声 (Dì sān shēng) is a falling then rising sound.

An example is  (Mǎ) which means horse. The fourth tone or 第四声 (Dì sì shēng) is a sharp falling sound.

An example is 骂 (Mà) which means to curse or is a curse.

The neutral tone is both central and flat. An example is  (Ma) which is like a question mark. In the main image above (by Lufti Gaos), you can see yet another "ma". The second word from the top means "code" and is also 3rd tone. (The sentence basically means "scan code, use bike". The last word is more like "vehicle" but here it'll be an abbreviation of the word for bike).

It's important to understand that Chinese people don't hear the above words like variations of the word "ma". They hear them as different words.

Learning Mandarin

I'll write more soon on the best methods for learning. If you want to get a taste for it in the meantime though, my current favorite is Tutor Ming. If you decide to try it, click here and it's a bit cheaper for you, and for me.

Book Review: Shrill (Notes from a loud woman)

I've also tackled a number of non-technical and non-scientific books recently. On that I'd heard good things about was Shrill (Notes from a loud woman) by Lindy West.

First note about this book is that it's full of NSFW (not suitable for work) words, so be warned if that would offend you.

As well as being loud, Lindy has spent her life being big. I'd say that's shaped her world view as much as anything else. Having been big much of my own life, I can directly relate to so many things she says.

Weight is one of the last great "acceptable" forms of discrimination in the world today. That's largely because when doctors and others haven't been able to cure it, they have to blame someone, so they blame the victim, and presume they must have just eaten too much, not exercised enough, or just don't have any willpower. Yet of my many friends, the heavy people that I are often the ones with the strongest willpower, and often excelling at every aspect of their lives except weight. One day, society will come to understand that obesity is an illness, and I think largely caused by excess carbohydrates, even though that's exactly what stupid guidelines have told us to eat over the years. But I digress.

Lindy had me almost crying at times, yet laughing so hard at other times. An NPR review called her "our fat, ferocious, and funny avenging angel". I'd put that as pretty close to the mark.

She's dealt with appalling situations and managed to still come back out with a sense of humor. It's surprising that she still has  decency towards those around her.

Bottom line?

I liked it. She's brash and loud, willing to challenge stereotypes, and tackle nonsense head-on.

Greg's rating: 8 out of 10

Note: as an Amazon Associate I earn from qualifying purchases but whether or not I recommend a book is unrelated to this. One day it might just help cover some of my site costs. (But given the rate, that's not really likely anyway 🙂

Shortcut: Zooming and navigating execution plans in SSMS

SQL Server execution plans can become quite large. That makes them hard to navigate because you are endlessly scrolling around the results pane in SQL Server Management Studio (SSMS).

The pane does have some zoom features. Note that if I right-click in the whitespace, I get these options:

So I can zoom in and out, set a custom zoom level, or zoom until the entire plan fits. Generally though, that would make the plan too small to read, as soon as you have a complicated plan.

But in one of the least discoverable UI features in SSMS, there is an option to pan around the plan.

If you click and hold that little + sign, you'll find you can pan around within the plan:

That's very nice but I think it needs to be a little easier to find.

SDU Tools: Script SQL Server Database Users

In our free SDU Tools for developers and DBAs, there's a tool that makes it easy to create a script for creating SQL Server database users.  No surprise, it's called ScriptDatabaseUsers.

One key advantage of having these scripting procedures and functions is that you incorporate them into other applications, and programmatically perform the scripting if you need to.

But if you just want to create a script, that works too.

You can see it in use in the main image above, and you can watch it in action here:

To become an SDU Insider and to get our free tools and eBooks, please just visit here:


Happy new year from SQL Down Under and from me

Hi Folks,

Just a short note today to wish everyone a happy new year. I know that new year is a time when many people make resolutions, with the intention of changing something about themselves during the upcoming year. It makes it a time full of hope for fresh starts.

I don't tend to make too many resolutions as I think that needs to happen on an ongoing basis rather than once per year, but I understand why people do.

From a company point of view, these are things you are likely to see from us this year:

  • Many new online and on-demand courses. (I know that many of our customers like in-person courses better but this is the low-cost and fast way to learn some topics, compared to waiting for us to run them in-person. And we've put a lot of effort into making sure the experience is as close to the in-person experience as it can be, with hands-on labs, quizzes, etc.) You'll find them at http://training.sqldownunder.com
  • New podcasts. I try to create a number of podcasts when a new version of SQL Server is about to appear. That gives me a chance to discuss the concepts behind the product features with the people who know about them, right while they're still finalizing them. You'll find the first SQL Server 2019 podcast with Argenis Fernandez here: https://sqldownunder.com/pages/sql-down-under-podcast
  • More in-person classes. Most will be in Melbourne, but it depends upon demand. Query Performance Tuning and Advanced T-SQL is one of our all-time favorites. It's running in Melbourne early next month. Early bird pricing is available now. Would love to see you there. If you can't make the full 5 days, you can do 2 days of Query Performance Tuning or 3 days of Advanced T-SQL separately. You'll find more here: https://sqldownunder.com/pages/sql-server-query-performance-tuning-and-advanced-t-sql-5-days, here: https://sqldownunder.com/pages/queryperformancetuning, and here: https://sqldownunder.com/pages/sql-server-advanced-t-sql-3-days
  • New eBooks. I have a couple of these planned. We'll see how many get out the door this year but if you don't have the current SSMS Tips and Tricks one, you'll find it here: http://ssmsbook.sqldownunder.com
  • Many more of our free developer and DBA tools: SDU Tools. Even if you aren't wanting a full set of tools, these are great examples of how to do things in SQL Server using T-SQL. If you don't have them yet, look here: http://sdutools.sqldownunder.com Version 13 is out now, and we've already got some great additions coming in Version 14.

Regardless, I hope you are all safe and well, and have an awesome new year!


SQL: Is indexing bit columns useful in SQL Server?

One data type that's often been quite controversial is the bit data type. So a bit is where we have an individual value, which is either a zero or one. They're the only values allowed unless it's NULL.

Always keep in mind when you're working with bits in SQL server that they're aren't just 2 potential states, there are three. It can be zero, or one, or it can be NULL. That is, it can have no value at all.

A common misconception

Bit values can be indexed and contrary to many misconceptions that have been around over the years, indexing bits can be very, very useful.

Now I used to see a lot of Microsoft material, that said there is absolutely no point indexing bit columns because they can only have two values. But the issue is not the number of values that a column can have, the issue is how selective the value is.

Why did they think that?

Let me explain the thinking that led to this. Imagine I had a table that's holding the people who are attending a conference and we have an indication of whether or not they've previously attended the same conference. Let's say that 50% of them have been to the conference before. Now having an index on that column probably isn't going to be very useful, as that's not at all selective.

To be selective, you need something that's quite tiny in terms of the number of rows returned compared to the total number of rows in the table.

But let's consider a different situation. If I have a transaction table and it contains billions of transactions, 500 of which are unfinalized. I could have a column in the table called IsFinalized that is a bit data type. (My purist friends would argue that's not normalized properly but run with me on this for now). I would never use the index on that column to find all the ones that are finalized, because that's most of the table.

However, if I ever went looking for the unfinalized transactions, I most certainly would hope there's an index there to let me find those 500 unfinalized ones.

It would also make a good candidate for a filtered index, but that's a topic for another day.

Why is the idea so common?

The idea that indexing bits isn't useful started back in early versions of SQL Server. In SQL Server 7, you couldn't build an index on a bit column. In SQL Server 2000, you could do it via T-SQL but the tools like Enterprise Manager didn't give you the option. Even the SQL Server training materials from Microsoft said it wasn't useful.

So people got the idea that it wasn't useful. But that was never true.

Indexing bit columns can be very, very powerful where the data is selective.

Learn about indexing

Want to learn more about indexing? Take our online on-demand course now:


Learning Mandarin: Chinese is a tonal language – what does that mean?

One of the challenges for newcomers learning any Chinese dialect (including Mandarin), is that the languages are tonal languages. In English, we tend to use tone for emphasis but it's not really used to change to a different word. 

For example, if I said "he really wants to go", if I draw out and more heavily the word "really", I can make it sound emphatic rather than casual.

The exceptions to this would be homophones, where we have words that are spelled the same but entirely different meanings. A simple example would be "lead" where you can "lead a horse to water" but also "go down like a lead balloon".

In Chinese, words that have the same Pīnyīn letters but different tones, are usually (but not always) entirely different words. A common example is "ma", which depending upon tones can mean a horse, a mother, a question mark, and many more things. Even when the tones are the same, it can mean many different things, and you need context to make sense of it.

Because of this, what I found difficult at first, is talking to Chinese people. The problem is that if they hear "ma" with the first tone, and "ma" with the third tone, they don't hear a badly pronounced word, they hear an entirely different word. There's no mental connection between the two.

In English, that's not so much of an issue. For example, if I said "there is lead in the petrol" but pronounced "lead" like the one in "lead a horse to water", an English speaker can still understand you, and will just think you have a lousy accent. We have a mental connection between those two words.

I find that Chinese speakers do not have this mental connection. It's probably because we relate the two different words to the same spelling, whereas "ma" and "ma" are often entirely different characters in Hanzi (or Chinese characters):

It's only in Pīnyīn that the letters seem related. To a speaker, they aren't related at all. Only words with the same letters and tones seem related this way.

Learning Mandarin

I'll write more soon on the best methods for learning. If you want to get a taste for it in the meantime though, my current favorite is Tutor Ming. If you decide to try it, click here and it's a bit cheaper for you, and for me.

Book Review: The Selfish Gene (40th Anniversary Edition)

Another classic book that I've gone through again lately is The Selfish Gene  by Richard Dawkins. I listened to the 40th Anniversary Edition on Audible as he was narrating it himself, and I particularly wanted to listen to the 40th Anniversary Edition to see how his own thoughts had changed over time.

Dawkins can be a polarizing figure. I'm mindful of how his work would have been received at the time it was written. I understand many of the comments that people make on him, but I find the vast majority are either misunderstanding him (you need to take him very literally but I find people read into his words, things that he's not actually saying), or are feeling like their deep-held beliefs are being severely challenged. I have no doubt that the latter hurts.

It's hard to believe that this was written 40 years ago, as the ideas are still fairly fresh and interesting to listen to. I liked the way that, instead of fixing his earlier ideas on the fly, he read the book as it was originally written, then inserted final paragraphs after each chapter if necessary to show where he'd updated his thinking or where he now thinks he was wrong. Being prepared to do so, is one sign of a true scientist. Many others would have just updated the content to current thinking.

One the whole though, I think he's done a great service to humanity, in helping us break out of medieval thinking.

This book explains his thoughts on many aspects of evolutionary biology. He dives into genes as persistent units of information, and the bodies that they inhabit, are basically vehicles chosen for their ability to aid in the replication of the genes.

One thing that I didn't particularly like is that in a number of places, he really belabors the points being made. I presume that more modern eyes don't need the reinforcement that readers 40 years ago would have needed. I also find that the analogies start to break down the further they are stretched.

The cover said "The most inspiring science book of all time". That would be a big call, and I can think of others that should have that moniker instead.

Bottom line?

This book is interesting and I understand that many would find the concepts challenging to their world views. It must have been ever so challenging at the time the book was written.

Greg's rating: 8 out of 10

Note: as an Amazon Associate I earn from qualifying purchases but whether or not I recommend a book is unrelated to this. One day it might just help cover some of my site costs. (But given the rate, that's not really likely anyway 🙂

Shortcut: Saving and sharing SQL Server deadlock graphs

In an earlier post, I described how query plans could be saved as .sqlplan file, shared, and loaded again in SQL Server Management Studio (SSMS). It's also possible to extract them out of SQL Server Profiler or Extended Events Profiler.

This is useful, but the same applies to deadlock graphs. SQL Server 2005 added Deadlock graph as a type of event in SQL Server Profiler. (It's also part of Extended Events Profiler).

If I open a new trace in Profiler, I can add Deadlock graph to the list of events:

Then when a deadlock occurs, we can see an entry for it in the trace:

Now if you want to share that with someone else, you can right-click the deadlock graph event and extract the event data:

When you do this for a deadlock, the output is saved as a .xdl file extension.

You can then send it to someone else and they can load it in SSMS:

One important thing to note is that a deadlock graph is actually a list of deadlocks, not just one deadlock. When a deadlock graph is opened in SSMS, it only shows the first deadlock in the graph.