Minimizing locking while archiving rows from a main table in SQL Server T-SQL

I’ve mentioned before that I really don’t mind working with large tables. They’re only slow to use if you’re looking through all the rows when trying to find the data that you want. But there is still sometimes a valid need to archive data from a main table, across into an archive table. And people often run into problems, particularly blocking problems, when they try to do that.
Let’s start with an example pair of tables (much simplified for clarity):
USE tempdb;
GO
DROP TABLE IF EXISTS dbo.MainTable;
DROP TABLE IF EXISTS dbo.ArchiveTable;
GO
CREATE TABLE dbo.Transactions
(
TransactionID bigint NOT NULL
CONSTRAINT PK_dbo_Transactions PRIMARY KEY,
TransactionDate date NOT NULL,
Amount decimal(18,2) NOT NULL
);
GO
CREATE INDEX IX_dbo_Transactions_SupportArchive
ON dbo.Transactions (TransactionDate)
INCLUDE (Amount);
GO
CREATE TABLE dbo.ArchivedTransactions
(
TransactionID bigint NOT NULL
CONSTRAINT PK_dbo_ArchivedTransactions PRIMARY KEY,
TransactionDate date NOT NULL,
Amount decimal(18,2) NOT NULL
);
GO
The wrong way
What I see most people do when they need to archive the data is something like this:
DECLARE @CutoffDate date = '20200101';
INSERT dbo.ArchivedTransactions
(
TransactionID, TransactionDate, Amount
)
SELECT TransactionID, TransactionDate, Amount
FROM dbo.Transactions
WHERE TransactionDate < @CutoffDate;
DELETE dbo.Transactions
WHERE TransactionDate < @CutoffDate;
GO
There are two basic problems with this:
- If the SELECT picks up more than 5000 rows, then by default, a lock is taken on the whole dbo.Transactions table. (I had a suitable covering index but the same locking issue would apply if there wasn’t one) If that happens, there will be issues for other people using the system, including people trying to insert new transactions. So they’ll probably only be allowed to do this at weird hours of the day.
- It’s much more subtle, but unless you’re certain that the rows being archived cannot change, you aren’t 100% sure you’re deleting the same rows as you are selecting.
Sometimes they might even wrap the whole thing in a transaction. That might cause them to have both tables tied up in locks.
A better way
Both of these issues can be avoided this way:
DECLARE @CutoffDate date = '20200101';
WHILE (1 = 1)
BEGIN
DELETE TOP(1000) dbo.Transactions
OUTPUT deleted.TransactionID,
deleted.TransactionDate,
deleted.Amount
INTO dbo.ArchivedTransactions
(
TransactionID, TransactionDate, Amount
)
WHERE TransactionDate < @CutoffDate;
IF @@ROWCOUNT < 1000 BREAK;
END;
GO
Limiting the deletions to 1000 rows will avoid the table locks, as long as there is a suitable index to support the delete.
Using the OUTPUT clause of the DELETE statement makes sure that the rows you delete are the same ones that you archive.
If there’s any chance you might have other locking contending with this, you might try using READPAST:
DECLARE @CutoffDate date = '20200101';
WHILE (1 = 1)
BEGIN
DELETE TOP(1000) dbo.Transactions WITH (READPAST)
OUTPUT deleted.TransactionID,
deleted.TransactionDate,
deleted.Amount
INTO dbo.ArchivedTransactions
(
TransactionID, TransactionDate, Amount
)
WHERE TransactionDate < @CutoffDate;
IF @@ROWCOUNT < 1000 BREAK;
END;
GO
There aren’t many places where I use READPAST but this is one of them. READPAST says if you come across a row that’s locked, just skip over it instead of blocking.
A final note is that instead of an appropriate non-clustered index like I’ve used here, if you can just select blocks of data using whatever the clustering key is instead, that would be even better.
2020-02-27