T-SQL 101: 111 Using UNION and UNION ALL

There are times when you need to connect together two sets of results into a single result set. The UNION statement is the way we do that.
In the example shown above, I have two SELECT queries. Notice that I could just highlight either one of them, and run them, and I’d see those values. But if I run the whole query, I’ll get back a single set of results that combines data from each of the queries. Note the results:
You can see that PK8 came from the dbo.Packages table while PG8 came from the dbo.ProductGroups table.
Aliases
When you execute a UNION statement, the column names (or aliases) are taken from the first query. It doesn’t matter what you call the columns in the second (or third) query, those names are ignored.
What is required is that the same number of columns have to be in place in both row sets, and they need to be compatible data types. For example, column 1 in the second row set must be able to be implicitly cast into the data type for column 1 in the first row set. And same for column 2, etc.
UNION vs UNION ALL
What happens when you just use the operator UNION, is to combine the two sets of results, and then remove any of the duplicates. It effectively implements a DISTINCT operation across the results.
Note that if rowset 2 had two identical rows of data, those duplicates will be removed as well. It’s not just a case of removing rows that are common between the two row sets.
To avoid this DISTINCT operation happening, you can say UNION ALL instead. That tells SQL Server to just give you all the rows from both row sets.
Performance Issues
A really common performance issue when writing T-SQL queries, is to write UNION when UNION ALL would have been suitable. The DISTINCT operation can have a serious performance impact.
Learning T-SQL
It’s worth your while becoming proficient in SQL. If you’d like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.
2025-02-08