SQL: NEWSEQUENTIALID – Nice try but missed the target

Miss the Target

GUIDs (uniqueidentifiers in SQL Server) are well understood to cause fragmentation issues when used as clustering keys for SQL Server tables. It's easy to understand why, but is NEWSEQUENTIALID the answer? I don't think so. Here's why.

When a SQL Server table has a clustered index, it needs to maintain the logical sequence of rows in the order of the clustering key. There are a few important notes here:

  • I've mentioned key, not column. The key might involve more than one column.
  • I'm not talking about primary keys. While the default for SQL Server is to make primary keys clustered if there isn't already another clustering key, primary keys are a logical construct unrelated to how the table data is stored.
  • I'm not talking about physical storage. I often hear people say that clustered indexes determine how data is physically stored. That was in old Microsoft courseware, etc. and just isn't true. It's about logical ordering. If not, there would be no fragmentation and changing the data would take forever.
  • The problem with using GUIDs as clustering keys is that they are typically generated in a random order using a function like NEWID() or using a function like System.Guid.NewGuid() in .NET (or similar functions in other languages).

Let's take a look at this by creating a table and populating it:

I've used NEWID() to populate the ImportantFactID column, and because we've declared that column as a primary key, and no other index is clustered, it will also be the clustering key. So the table will be clustered on ImportantFactID.

Now let's look at the values that have been inserted:

Clearly the table data is being inserted in quite a random order. And this is what leads to the fragmentation.

So, it would seem sensible that we'd want to insert data in order. Mostly we do want that. There are some very high-end scenarios where the random inserts are better, but for most people, they are a problem.

Because this had become a common problem, SQL Server 2005 introduced NEWSEQUENTIALID(). It produces values that look like GUIDs but always increase in value.

Notice that you can directly create a value with NEWID():

But you can't do that with NEWSEQUENTIALID():

NEWSEQUENTIALID() can basically only be used as a column default. Fair enough, but what do the values look like?

Let's recreate the table and find out:

And let's look at the data that was inserted:

The first thing that should strike you is that the values are now nothing like random. They are increasing but note that they are also predictable. That mightn't concern you but I work at many sites where some reliance is placed on people not being able to guess GUID values. Clearly this isn't the case with NEWSEQUENTIALID().

If you also wonder if you are now more likely to have a collision on generated values, the answer is "yes".

But is the concept of them being ever increasing the answer to our problem?

I'd argue that it isn't. You need to consider why people use GUID values in the first place. 99% of the time, it's so that any layer of code can generate an ID without having to refer to a central ID controller. I can write an application where I generate an order ID, and attach order details to that order ID, and then throw the lot into the database without concern about colliding with something else, and without needing to make a round-trip to the database to first get a new ID.

As soon as I'm using NEWSEQUENTIALID(), at least as soon as I'm using the T-SQL function, I need to go to the database to get an ID as it's only available as a column default. If I'm going to the database to get an ID, what advantage does a pseudo-GUID give me over just getting a bigint? You might as well just get a bigint.

For me, the T-SQL NEWSEQUENTIALID() function completely misses the point of why people use GUID values in the first place and it seems pointless to me.

Automatically increasing GUID values might have a place, but if they do, it's only the ones that are generated in upper layers of code, not those generated in the database.

My verdict is that NEWSEQUENTIALID() is in the category of "nice try, seemed a good idea at the time, but completely missed the target".

6 thoughts on “SQL: NEWSEQUENTIALID – Nice try but missed the target”

  1. Interesting post.
    The documentation of uniqueidentifier says:
    "Merge replication and transactional replication with updating subscriptions use uniqueidentifier columns to guarantee that rows are uniquely identified across multiple copies of the table."

    So for instance in the use case of syncing data from your mobile(s) to a central database, the newsequentialid() still has its use over using a bigint.

    Would you happen to know if there is a technical reason why SQL Server does not allow to do a select NEWSEQUENTIALID()?

    1. No idea on why you can't do a SELECT on NEWSEQUENTIALID(). My guess is that because it's only a small range and keeps increasing, that burning them as part of SELECT operations probably isn't a good idea. But that's just speculation.

      Merge replication is renowned for being slow. It's just not the best way to do that any more. There are so many other options but merge replication is the easy built-in one. I'm often at sites where merge replication has been used in volume, and they end up with merge storms. You also need to spend time performance tuning the sp_MSSetupBelongs stored procedure. (Name off the top of my head)

      And worth noting that even though updating subscribers to transactional replication is mentioned, it's now deprecated and shouldn't be used.

  2. I think you completely misunderstood the use case for SEQUENTIAL guids. The idea is that you want a universally unique ID but you also don't want the fragmentation of the indices that a "normal" guid creates. SEQUENTIAL guid's solves both these problems. Generating SEQUENTIAL guid's from code is also possible by calling UuidCreateSequential in rpcrt4.dll in Windows.

    1. Not at all. I totally get why they thought it would be of value but I think it misses the mark. UuidCreateSequential (or any sort of COMB GUID) is ok to call from application code, even with all its failings. But that's done at the app layer, and is of value because it avoids the need to go to a central "number-issuer" like a database first.

      NEWSEQUENTIALID however, can only be used as a database column default. By itself, it's far more likely than a GUID to have a collision, and it has known potential issues with duplicates after reboots.

      More importantly though, if you are going to go to the database to get a value, what is the point of getting one that's not even a good GUID? What is the advantage of that over a bigint? You'd be much better picking up a value from a bigint sequence. And if cross-machine collisions are a concern, you're also going to be better off with a partitioned sequence across those same machines.

      1. It is not far more likely to have a collision, I don't know where you get that from. Neither is there any risk of duplicates after reboot. The guid is guaranteed to be globally unique just like a normal guid. Just because you can guess the next id by looking at an existing does not mean that a guid generator algorithm would be any more likely to generate a collision to that guid than any other guid.

        Also I don't see a problem with returning the generated ID when doing an insert if you need it back.
        As for comparing with big int that is a completely different use case, if I don't require any global uniqueness I will of course go with int.

        1. Any time you constrain the possible next values, you are, by definition, increasing the chance of a collision. After a reboot, these values can start again with a lower value. There is no guarantee of global uniqueness. That's described on the documentation page for the function. Uniqueness of these is based on MAC addresses for the system being unique. That cannot be relied upon. Some systems don't even have MAC addresses but there are also numerous examples of duplicate MAC addresses in both physical and virtual network adapters. Normally that's only a problem if you strike two of them in the same subnet but if you are using them for global uniqueness, that's not valid.

          But regardless, a bigint is going to be far more efficient to work with than any type of uniqueidentifier value, so why use it? If the value was generated in the application tier, that totally makes sense. But if the way it's generated is to go to the database to get a value, what is the point of returning a value that's awkward to work with, and less efficient to work with? You could just as easily number each of your servers, and give each say a non-overlapping range of a thousand billion bigints. Your system would be easier to work with and faster.

          Avoiding round trips by doing this in the application tier though, is a completely different story. But NEWSEQUENTIALID can't be used that way.

Leave a Reply

Your email address will not be published. Required fields are marked *