SQL: Using UNION when you should use UNION ALL is a performance problem

SQL: Using UNION when you should use UNION ALL is a performance problem

I spend a lot of time reviewing SQL Server T-SQL code from developers. There are quite a few issues that I see regularly, and one of these relates to UNION.

In SQL Server, a UNION statement is actually a UNION DISTINCT (but we’re not allowed to put the word DISTINCT in the syntax). The alternative is a UNION ALL. Often a UNION ALL is what’s needed, and using a UNION leads to unnecessary performance problems.

In the WideWorldImporters database, I can write this UNION ALL based statement:

The problem that I commonly see is that developers write this query with UNION instead of UNION ALL. When they do that, SQL Server has to perform a sort (and DISTINCT) operation across the entire returned rowset, sorting by every column.

That’s ok if you really need to only return distinct values, but most times I see this done for values that cannot be the same.

In this example, even if the customer and supplier have the same name, the other columns will be different so using UNION instead of UNION ALL is very painful. SQL Server is a fast database engine, but there’s probably no need here to have it sorting all the names, URLs, phone and fax numbers, etc. before the query output is produced.

Whenever you are considering a UNION, please consider using UNION ALL if that’s what you really mean.

2018-01-29