SQL: NEWSEQUENTIALID - Nice try but missed the target

GUIDs (uniqueidentifiers in SQL Server) are well understood to cause fragmentation issues when used as clustering keys for SQL Server tables. It’s easy to understand why, but is NEWSEQUENTIALID the answer? I don’t think so. Here’s why.
When a SQL Server table has a clustered index, it needs to maintain the logical sequence of rows in the order of the clustering key. There are a few important notes here:
- I’ve mentioned key, not column. The key might involve more than one column.
- I’m not talking about primary keys. While the default for SQL Server is to make primary keys clustered if there isn’t already another clustering key, primary keys are a logical construct unrelated to how the table data is stored.
- I’m not talking about physical storage. I often hear people say that clustered indexes determine how data is physically stored. That was in old Microsoft courseware, etc. and just isn’t true. It’s about logical ordering. If not, there would be no fragmentation and changing the data would take forever.
- The problem with using GUIDs as clustering keys is that they are typically generated in a random order using a function like NEWID() or using a function like System.Guid.NewGuid() in .NET (or similar functions in other languages).
Let’s take a look at this by creating a table and populating it:
I’ve used NEWID() to populate the ImportantFactID column, and because we’ve declared that column as a primary key, and no other index is clustered, it will also be the clustering key. So the table will be clustered on ImportantFactID.
Now let’s look at the values that have been inserted:
Clearly the table data is being inserted in quite a random order. And this is what leads to the fragmentation.
So, it would seem sensible that we’d want to insert data in order. Mostly we do want that. There are some very high-end scenarios where the random inserts are better, but for most people, they are a problem.
Because this had become a common problem, SQL Server 2005 introduced NEWSEQUENTIALID(). It produces values that look like GUIDs but always increase in value.
Notice that you can directly create a value with NEWID():
But you can’t do that with NEWSEQUENTIALID():
NEWSEQUENTIALID() can basically only be used as a column default. Fair enough, but what do the values look like?
Let’s recreate the table and find out:
And let’s look at the data that was inserted:
The first thing that should strike you is that the values are now nothing like random. They are increasing but note that they are also predictable. That mightn’t concern you but I work at many sites where some reliance is placed on people not being able to guess GUID values. Clearly this isn’t the case with NEWSEQUENTIALID().
If you also wonder if you are now more likely to have a collision on generated values, the answer is “yes”.
But is the concept of them being ever increasing the answer to our problem?
I’d argue that it isn’t. You need to consider why people use GUID values in the first place. 99% of the time, it’s so that any layer of code can generate an ID without having to refer to a central ID controller. I can write an application where I generate an order ID, and attach order details to that order ID, and then throw the lot into the database without concern about colliding with something else, and without needing to make a round-trip to the database to first get a new ID.
As soon as I’m using NEWSEQUENTIALID(), at least as soon as I’m using the T-SQL function, I need to go to the database to get an ID as it’s only available as a column default. If I’m going to the database to get an ID, what advantage does a pseudo-GUID give me over just getting a bigint? You might as well just get a bigint.
For me, the T-SQL NEWSEQUENTIALID() function completely misses the point of why people use GUID values in the first place and it seems pointless to me.
Automatically increasing GUID values might have a place, but if they do, it’s only the ones that are generated in upper layers of code, not those generated in the database.
My verdict is that NEWSEQUENTIALID() is in the category of “nice try, seemed a good idea at the time, but completely missed the target”.
2017-10-26