Mixing UNION and UNION ALL and other oddities seen in consulting

I always say that one of the things that I love about consulting or mentoring work is that I see things (mostly code) that I would have never have thought of, both good and bad. You could spend all day dreaming up bizarre ideas and never come close to the ones that I just happen to come across.

A good example of this was a site I was at a while back where every table had a GUID name. Yes, I’m talking about tables named dbo.[3B38AB7E-FB80-4E56-9E5A-6ECED7A8FA17] and so on. They had tens of thousands of tables named this way. Query plans were close to unreadable.

Another was a site where the databases were named by their location on the disk. Yes, they had a database named X:\Database Files\Data Files\CoreDB.mdf. And yes that does mean that you end up using it like:

image

Not all odd things that I see are so blatant though. Today I saw a more subtle coding issue. With SQL Server the UNION operation combines to rowsets into a single rowset. If UNION ALL is used then all rows are returned. With just UNION without the ALL, only distinct rows are returned. All good so far.

But until today, I’d never stopped to think about what happens when you mix the two operations. For example, without running the code (or reading further ahead yet), what would you expect the output of the following command to be? (The real code read from a table but I’ve mocked it up with a VALUES clause to make it easier to see the outcome).

image

I was left wondering if there was some type of operation precedence between UNION and UNION ALL. The output rows are:

clip_image001

It isn’t a case of precedence. The operations are just being applied in order. You can see this as follows:

Executing the first part:

image

returns the following with no surprises:

clip_image002

Executing the first two parts:

image

returns all rows from both queries:

clip_image003

Executing the first three parts:

image

returns the following rows. This is formed by taking the previous result, executing the third query then performing a distinct operation on the whole combined rowset.

clip_image004

Executing the entire query then takes this previous result set and appends (based on the UNION ALL), the results from the fourth part of the query.

clip_image005

Regardless of how it actually works, I think it’s important to avoid writing code where the outcome is less than obvious. In this case, it was just a bug but if the code as written was the intended code, adding some parentheses to this query might have made the intent clearer.

And of course in this case, a sprinkle of extra DISTINCT and GROUP BY operations made it a thing of beauty:

image

which actually returned:

clip_image006

So what they really should have written in the first place was:

image

sigh

2015-05-28