This week, one of my colleagues was complaining that he had to do work on a database where every table had identity columns, and they all started at large negative values.
I remember some people promoting this years ago, to double the available numeric range for the data types, rather than just using positive values.
I didn't like it when I saw it then, and I still don't like it.
Let's take a quick look at an example:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
USE tempdb; GO DROP TABLE IF EXISTS dbo.Customers; GO CREATE TABLE dbo.Customers ( CustomerID int IDENTITY(-2147483647, 1) NOT NULL CONSTRAINT PK_dbo_Customers PRIMARY KEY, CustomerName nvarchar(50) NOT NULL, CreditLimit decimal(18,2) NOT NULL ); GO INSERT dbo.Customers ( CustomerName, CreditLimit ) VALUES (N'Johnny Nobody & Sons', 100000), (N'Yet another company', 50000), (N'Dodgy Brothers', 0); GO SELECT * FROM dbo.Customers; GO |
The output from that is shown in the main image above.
Yes, I understand that you can have 4,294,967,295 values instead of "only" 2,147,483,647 values. But what you've also done is make the table a pain in the neck for anyone trying to work on it. Machines aren't the only audience for these numbers; humans need to work on them too.
There are freaks of nature who can remember numbers like this, but I'm certainly not one of them, and I'm not alone.
Worse, they had consistently applied this "technique" across the entire database, even on tables that only ever had a couple of hundred rows. So they've made maintenance on the tables much harder, for no reason at all.
Very few tables are going to have more than 2 billion rows, and if they do, just use a bigint. And as for starting bigint values at -9,223,372,036,854,775,808, just don't.
And if you're using sequences instead (like we mostly do), don't do it there either. You also don't need to squeeze the last value out and use zero either.
Finally, it's important to consider that these large negative values don't compress as well, so if you're using at least row compression (which you should all be using at a minimum), the values are a problem for that too.
So look, please just don't do this.
Hi Greg, can you expand on "And if you're using sequences instead (like we mostly do)" ?
Hey Mitch, we tend to always use sequences instead of identity columns nowadays. They have many advantages. For a start, they can be used across tables, so I can have a single BookingID for a CarBookings, FlightBookings, HotelBookings set of tables. Second, they work well across linked servers. SET IDENTITY_INSERT blah ON doesn't work on a linked server if I have to move data, and it avoids the need to be using DBCC commands to reset the next value to be used after importing data. You have better control over the caching of them, how they recycle, max values, etc. etc.
I'll put up a post about it soon.
Correct me if I'm wrong, but I thought sequences didn't take part in transactions?
Hi Mitch, they don't but neither do identity columns. There's no guarantee with identity columns that you won't have gaps. They can also be caused by restarts, as some values are cached.
Another good reason not to start with negatives is to "save" those values for future use. I worked with a system that used 32-bit ints for IDs and whatnot, but no one ever really thought about how many there would be – until one day it overflowed and the system stopped working. Obviously the solution is to use 64-bit ints – but that's a code change, a database change, etc, which is hard to do when the customer is running a heavily-customized variant of an obsolete version of the software, and it needs to be fixed TODAY.
So instead, they just put a big negative into the "next ID number" table, buying themselves another 2 billion ID numbers, and several years to actually fix the problem. (I didn't stay there long enough to see whether anyone actually got the customer upgraded to a 64-bit version, or just figured "Eh, don't worry about it – our techies can always come up with a workaround, just like they did last time")
Hi Mark, sounds like a plan. But I'd be monitoring it and moving to bigint long before that happened. Regards, Greg
I don't agree with your conclusion that large negative values doesn't compress well.
See https://www.sqltopia.com/downside-of-using-negative-values/
Intereresting Peter. When I'd tried ROW in the past, I'd seen differences. Did you manage to try a CCI ? There were comments about it but that's one that I wouldn't have expected to see a difference on.
Other points still apply for me regardless. (And obviously to others from the amount of feedback I've had).
Heh… awwww maaaannnn, Peter! You posted your code as graphics?
I used to do the same because the blog tools I used at the time wouldn't allow me to have half-decent looking code. But the one I'm using in recent years does it just fine.