SQL: Developers need to stop rewriting history

I often deal with quite large databases. There are only two things about big databases that trouble me. One is queries that need to read the whole database to find anything. The other is big databases that are still designed like small databases.

Every week, I still see massive SQL Server databases where all of the data is in a single PRIMARY filegroup. That's a problem but often the people looking after the databases can't do much to improve that situation.

One of the most important things that's needed in a large database is an effective strategy for separating out old data. My preferred tool of choice for that is partitioning.

I was at a bank in Melbourne where they had a 1.9TB table that was being replicated to Singapore over a slow link. Things were ok, right up until it ever needed to be reinitialized. Then the issue was about how to get all that data up to the other end of the link. Their current solution was to back it up, hand it to a staff member, and say "Have fun in Singapore".

Awesome image by Chen Hu
Awesome image by Chen Hu

But ask yourself how much of that table is likely to ever be modified. We partitioned the table by month, made all the old partitions highly compressed and read-only, and the only read-write partition was a few hundred MB.

If they needed to reinitialize, they only need to copy up that last partition as they know that all the rest of the data is already at the other end, and is unchanged.

This is a great outcome but only works because we were able to freeze most of the table as read-only.

That needs good design work from the developers.

I've worked on other systems where large transaction tables could never be frozen like this. By design, if a transaction was reversed, they'd go back and change the original transaction to flag it as reversed.

This is not sensible.

Our systems should not be designed that way. It's important for developers of large tables to consider how these tables will be used.

An effective archiving strategy depends upon immutable historical data.

Winston Churchill said "History will be kind to me for I intend to write it." We, on the other hand, should not be rewriting history in our databases.

 

Leave a Reply

Your email address will not be published.