SQL: What on earth is Halloween protection?

If you’ve worked with SQL Server (or with databases in general) for any length of time, you will have heard someone refer to “Halloween protection”. It refers to a problem where update operations could inadvertently end up modifying a row more than once.
I saw someone who was pretty knowledgeable complaining about just this problem recently. He was using a dynamic cursor in SQL Server. The problem was that he was progressing along an index, but when a row was modified, it was relocated to a position in front of where he was processing, so it got processed again.
To avoid this, we need to ensure that any update operation reads the data first, and keeps a list of the values that were read, before starting to do the update. For standard updates, SQL Server does this automatically. But this guy had fallen foul of the problem by using a dynamic cursor. If he had used a static cursor (which would have had horrid performance), he would have been ok. It’s gets a copy of all the data before starting.
It’s not all about cursors though. Let’s look at how SQL Server handles this in typical updates with an example:
The query plan for that is quite straightforward:
If I clustered the table on the SalePrice though, that would be a problem, and you can see how the query plan changes. Note the additional query plan step:
Lots of queries have this extra table spool step, and for similar reasons. It’s not exactly the same problem but look at this query’s plan:
In most high-level languages, you can’t just say A = B and B = A to swap values over, but it works just fine in SQL. That’s because SQL Server has read the values, and has a copy of them from before the changes were made.
The name Halloween protection comes from when the problem occurred during research on the System R project at the IBM Almaden Research Center back in 1976. So the name comes from when it happened, rather than from anything about the problem itself. The researchers ( Don Chamberlin, Pat Selinger, and Morton Astrahan) ran a query to raise everyone’s salary by 10 percent, only if they had a salary of under $25000. They were suprised to find that after the query had completed, no-one had a salary of less than that, because the rows had been updated more than once, until they reached the $25000 value.
The name has been used to describe this problem ever since.
2019-02-01