I always say that one of the things that I love about consulting or mentoring work is that I see things (mostly code) that I would have never have thought of, both good and bad. You could spend all day dreaming up bizarre ideas and never come close to the ones that I just happen to come across.
A good example of this was a site I was at a while back where every table had a GUID name. Yes, I'm talking about tables named dbo.[3B38AB7E-FB80-4E56-9E5A-6ECED7A8FA17] and so on. They had tens of thousands of tables named this way. Query plans were close to unreadable.
Another was a site where the databases were named by their location on the disk. Yes, they had a database named X:\Database Files\Data Files\CoreDB.mdf. And yes that does mean that you end up using it like:
Not all odd things that I see are so blatant though. Today I saw a more subtle coding issue. With SQL Server the UNION operation combines to rowsets into a single rowset. If UNION ALL is used then all rows are returned. With just UNION without the ALL, only distinct rows are returned. All good so far.
But until today, I'd never stopped to think about what happens when you mix the two operations. For example, without running the code (or reading further ahead yet), what would you expect the output of the following command to be? (The real code read from a table but I've mocked it up with a VALUES clause to make it easier to see the outcome).
I was left wondering if there was some type of operation precedence between UNION and UNION ALL. The output rows are:
It isn't a case of precedence. The operations are just being applied in order. You can see this as follows:
Executing the first part:
returns the following with no surprises:
Executing the first two parts:
returns all rows from both queries:
Executing the first three parts:
returns the following rows. This is formed by taking the previous result, executing the third query then performing a distinct operation on the whole combined rowset.
Executing the entire query then takes this previous result set and appends (based on the UNION ALL), the results from the fourth part of the query.
Regardless of how it actually works, I think it's important to avoid writing code where the outcome is less than obvious. In this case, it was just a bug but if the code as written was the intended code, adding some parentheses to this query might have made the intent clearer.
And of course in this case, a sprinkle of extra DISTINCT and GROUP BY operations made it a thing of beauty:
which actually returned:
So what they really should have written in the first place was:
<sigh>
I recently put together some code at work that involved UNIONing several tables and then EXCEPTing from another. I tried putting them all together at first except it obviously didn't work – the same thing you observed. I guess this is just how all table operators work in SQL Server.
@Dan – actually, you can do that – you can control UNION, EXCEPT and INTERSECT precedence using brackets.
e.g.
(
select…
union
select …
)
except
select …
Interesting… do you know if this order is consistent, or will it depend (like join orders) on whatever the optimiser thinks is a good idea from one run to another?
I used to work in a place where the original code was written in Italian… then maintained and improved and partial rewrites in English..
The order of logical operations is consistent, so in my example above, the first and second selects would always union, and the except would operate on the result of that union. The optimizer may choose different strategies for achieving that outcome, but the result would remain the same. I'm sure that behavior is documented somewhere in BOL, but I can't seem to find it right now.
FYI:
INTERSECT takes precedence over UNION (ALL) and EXCEPT meaning INTERSECT is evaluated first then UNION (ALL) and EXCEPT both of which have the same precedence and are left associative meaning they are evaluated from left to right (top to bottom). INTERSECT is also left associative.
e.g.
select 1 union all select 2 union all select 3 intersect select 3 except select 3
returns the set 1 and 2.
If all the set operators had the same precedence the above query would return the empty set. This can be simulated with the following query using brackets:
(select 1 union all select 2 union all select 3) intersect select 3 except select 3
In my example it doesn't matter whether you use UNION or UNION ALL since there are no overlapping values. The query optimizer can choose whatever path it finds best as long as the order of precedence is met.
My take on it? Don't leave it to chance. Use parentheses.
The ANSI/ISO Standard is to do set operators from left to right, following the usual parentheses rules. This is not the usual mathematical convention form Set Theory. And I have no idea what we did it that way, before you ask.