T-SQL 101: #87 Summarise sections of data by using GROUP BY

When you calculate an aggregate, the default is that it applies to the entire table, but you might not want that.

For example, I might want to calculate the longest shelf life for products. But I want to calculate that for each size of product.

If we look at the first example, we have added a GROUP BY clause to our query, and it returns exactly that. The output is on the left hand side below the query. For each size (determined by the GROUP BY), the maximum of the shelf life is returned.

But the problem is that even though it's returning me the maximum shelf life days for each size, I have no idea which size corresponds to which answer that it returns.

Almost any time that you use a GROUP BY, while it's not strictly required, you should include the columns that you are grouping by, in the SELECT clause. Then you can work out which answer is which.

The second query's output is on the right-hand side. It's much more useful.

Common Error

While this is all good,  I also see people try and write a query like the second one here:

The first one is fine, but if I'm grouping by the Size, it makes no sense to try to SELECT the OuterQuantity.

The problem in the second query above is that you start by getting products, and grouping them by size. At that point, they're not individual products, they're rows that have the grouping columns and any aggregates. OuterQuantity isn't part of that, so you can't then SELECT it.

The basic rule is: you can only SELECT columns that are either:

  • In the GROUP BY clause (and you should probably SELECT all of these)
  • Columns that you have aggregated
  • Constants (literal values that are unaffected by the grouping)

Learning T-SQL

It's worth your while becoming proficient in SQL. If you'd like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.

Book Review: High Performance SQL Server

I was pleased to get sent a copy of Benjamin Nevarez's new book High Performance SQL Server. I've known Benjamin for a long time. He's a very skilled SQL Server professional, and you'll see him at conferences around the world. (Or at least once Covid is tackled more completely).

And the technical reviewer for the book is another very skilled old friend in Mark Broadbent. So my expectations were high for the new edition of this book.

I wasn't disappointed.

It was refreshing to read a book that covered so much of the core knowledge that's needed when working with SQL Server and getting it to perform well.

Benjamin covers a wide variety of topics: an introduction to how SQL Server works internally (including now on Linux), how to configure it and how to work with tempdb. He then headed into monitoring, so that you could find what issues need to be dealt with. The book then looks at performance troubleshooting, and indexing. It finishes with a discussion of the Intelligent Query Processing features added in recent versions followed by a discussion on storage.

As I read, I kept seeing how he led into topics, and as soon as I'd think "I hope he mentions topic X" as I thought it was important, he'd then describe topic X. There were a few areas where I disagreed, but they were few and far between.

I can't tell you how pleasing it was to read a technical book that was well written, used English well, and  wasn't full of technical errors. I've seen so many books lately that just aren't like this. By comparison, I was the tech reviewer on a book just the other day, that I'd rather I'm not mentioned in the book at all once it's released. It was so poorly written.

Now, were there areas that I disagreed with Benjamin on?

Yes.

In particular, I just don't share the enthusiasm for the in-memory OLTP options in SQL Server, even for temporary objects. I've spent so much time trying them, and let's just say, I don't use them. I'm pretty sure if I extended Benjamin's examples on those a bit further, I think he'd agree.

Were there areas where I'd love to see the book improved?

Yes.

Here are some suggestions:

  • I think there are a few areas that are suffering from being updates, rather than having been written from scratch recently. As an example, Benjamin linked to an MSDN white paper that I wrote in 2008, and while still somewhat relevant, I did an update to that paper in 2012. That would be a better reference.
  • The discussion on storage feels dated. What I would also love to see discussed in this section is info on other current challenges in this area. For example, the impact of de-duplicating I/O subsystems.
  • There are a number of style aspects I'd love to see changed. I'd like to see WideWorldImporters instead of AdventureWorks. I'd like to see statement terminators in all the code (personal bugbear).
  • The order of the topics in theory makes sense, but in practice, I'm not so sure. I think a lot of readers would be put off by being thrown into a discussion of TDS as soon as they start reading.
  • The discussion on filtered indexes needs to be fleshed out further. They're an awesome feature, but you have to learn to use them, not just configure them.

I'd also like to see the book call out more on the importance of appropriate application design. In my work, I find that at least 70% of the SQL Server performance-related issues that I run into, are application design issues. When you're trying to fix them at the back end, it's already way too late to get real outcomes. I realise though, that that's all many people have a chance to do.

The Verdict ?

Overall though? What a wonderful book.   9 out of 10.

 

SQL: Calculating day of the week across a range of years in T-SQL

I had a question from Dale Kerr the other day about whether we had a tool in our SDU Tools collection, that calculated the day of the week (i.e. Tuesday, Thursday) for a given day of the year, across a range of years.

We don't have a specific tool for that, but the a CTE returning the list of years makes it easy.

(UPDATE: SDU Tools version 21 will have WeekdayAcrossYears that implements this)

Here's an example:

The output is shown in the main image above.

SQL: Use TRY_CAST instead of ISNUMERIC and ISDATE

Like most developers, I often need to check if a string value is a valid number or a valid date (or datetime). In T-SQL, the functions provided for this are ISNUMERIC and ISDATE. These are pretty generic functions though, and I almost never use them now. I believe you're now much better off using TRY_CAST instead.

"Numeric" is a fairly vague concept. Which type of number are we checking for?

"Date" is also vague in this context. The ISDATE function says it checks for datetime, or datetime.

Most of the time, what I'm really asking is "Can I safely convert this value to an integer?" or "Can I safely convert this value to a date?", etc.

Ever since we've had TRY_CAST, I think you're better off using it to attempt the operation and fail silently. Use TRY_CONVERT if you want to check using a style, and TRY_PARSE if you want to check a string using a different cultural setting than the one in your current session.

For example with "Can I safely convert this value to an integer?",

instead of writing:

write this:

At least then you can specify the precise data type that you're checking for.

 

 

 

T-SQL 101: #86 Summarizing data with SUM, AVG, MIN, MAX

I mentioned in a previous post that COUNT was an aggregate. The other common aggregates are shown in this table, and no surprise what they do.

SUM adds up or totals the values.

AVG calculates the average of the values.

MIN works out the minimum value.

MAX works out the maximum value.

But if you've started to think about how SQL Server works, you might be wondering about what happens with NULLs.

Aggregates, in general, ignore NULLs.

For SUM, what happens is pretty obvious. MIN and MAX return the minimum and maximum of the values that are not NULL.

And AVG calculates the average without the NULL values. Specifically, that means that if I have the values:

30, NULL, 40, NULL, 50

The average is 40. That's because the total of the non-NULL values is 120 and the count of non-NULL values is 3, so the average is 120 / 3.

You can also use these aggregates to create other aggregates. For example, some database management systems include a RANGE aggregate. That's the range of values, or the difference between the largest and smallest values. So MAX(value) – MIN(value) would give you the equivalent of RANGE(value).

Learning T-SQL

It's worth your while becoming proficient in SQL. If you'd like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.

SDU Tools: Weekday in same week using SQL Server T-SQL

A while back, I got a request to add a new function to our free SDU Tools for developers and DBAs. The user wanted to be able to find a particular day of the week, in a given week. So for example, if I give you a date of 30th April 2021, what is the Thursday in that week? So we've added a new function WeekdayOfSameWeek.

It takes two parameters:

@DayInTargetWeek date – any day that can be used to find the target week
@DayOfWeek int – Sunday = 1, Monday = 2, etc.

Note that the function assumes the week starts on a Sunday. I know my Asian friends (and others) consider the week to start on a Monday. It's easy to change to make it work that way.

The return value is a date.

Find out more

You can see it in action in the main image above, and in the video here. The full current version of the code is also shown below:

You can use our tools as a set or as a great example of how to write functions like these.

Access to SDU Tools is one of the benefits of being an SDU Insider, along with access to our other free tools and eBooks. Please just visit here for more info:

http://sdutools.sqldownunder.com

Latest version of the code

Note: the code might wrap when displayed below.

SQL Server Express and Reporting Services

I'm a fan of SQL Server Express edition. It's a fabulously functional free database that lets you work with up to 10GB of data per database. One aspect that's commonly misunderstood though, is that it also supports a limited version of SQL Server Reporting Services.

There are many scenarios where Express edition is the perfect fit. The Microsoft sales and marketing teams have had an ongoing strange relationship with Express edition because it's free. Much of the documentation tried to suggest that it was for hobbyists, and para-professionals. But Express edition has a great role to play in professional systems too. For example, if you needed a small database to run on a series of point of sale devices, it could well be the right answer, particularly if it's then connected to one of the higher editions as a central SQL Server system.

I used to use it many years ago as an easy way to provide a system test system for clients using one of our applications. That allowed them to get established quickly without SQL Server licensing. When they had been using it for a while, and their databases grew in size, that was the easiest upgrade you'd ever convince customers to do.

Reporting

What wasn't understood well early on, is that reporting was also needed. I remember arguing passionately about why Express edition needed to include some form of Reporting Services. The marketing teams didn't at first understand that if a software house needed a reporting solution, they weren't going to use something different on Express to what they used on other editions. They'd simply use a different reporting tool where they only had to write the code once.

Express edition regularly helped us to sell paid SQL Server licenses.

Fortunately, the teams eventually saw the light on this one, and a version of Express with Advanced Services was released. It included a limited form of Reporting Services. It was perfectly functional, but didn't allow you to report on any data, except from the Express edition it was running from. (This avoided Microsoft providing a free reporting tool for the world). I thought that was a great option.

So where it is now?

Lately though, I've been asked where Reporting Services for Express went. It's not in that Advanced Services option any more. Other good things are, like full text, but not Reporting Services.

The reason for that, is that since SQL Server 2016, Reporting Services has become a separate install. It's not directly bundled with SQL Server. And that applies to Express edition as well.

You need to install Reporting Services separately. It still understands deployment against Express edition.

You'll find details of installing it here.

One thing to note, is that since the 2016 change, multiple instances of Reporting Services on a single machine are not supported.  (You could do that previously).

Isn't it all about Power BI now?

Power BI is outstanding, but there are many scenarios where Reporting Services is still the best tool. Standard reports from Express edition is a great example.

If you need to learn more about Reporting Services, we've got a course to help you learn to use it properly. You'll find details of it, and our other courses, here: https://training.sqldownunder.com.

 

T-SQL 101: #85 Counting rows and column values with COUNT

In previous posts, I looked at how to read data from a table. Now, we need to look at how we do calculations on the data in the table.

The most basic calculation we might do is to count the number of rows in the table. The first example above does that.

What about the asterisk?

But also notice that is has an asterisk in the query. Some people worry about the asterisk being in their queries as usually having an asterisk isn't a good idea. In fact, some customers have automated systems for checking code, and the automated system might complain about the asterisk.

You could avoid the asterisk by writing this instead:

That might look odd at first, but what that says is "return the value 1 for every row, and count the number of 1's". So that's the same as the number of rows. I often write these queries with a 1 instead of an asterisk, but in the end, with SQL Server, it really is the same query.

Another Column

Look at the second query though. In that example, I've said COUNT(OrderComments) and you might think that would work the same.

But the first rule to learn about aggregates is that most of them ignore NULL values. So what this second query is actually saying is "count all the rows where OrderComments isn't NULL".

In my system, even though there are 88,640 orders in the system, and the first query returned 88640 as expected, there are only 38 that have order comments where the value isn't null, and so the second query returned 38.

Learning T-SQL

It's worth your while becoming proficient in SQL. If you'd like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.

SDU Tools: Date of Orthodox Easter in SQL Server T-SQL

Some time back, we added DateOfEasterSunday to our free SDU Tools for developers and DBAs. Given it was the Christian Easter Sunday, almost immediately, I got a request for the Greek Orthodox Easter. That date isn't of course just the Greek one, so we've added a new function DateOfOrthodoxEaster.

It's based on a concept from Antonios Chatzipavlis. Thanks !

It takes a single parameter:

@Year int – the year to find the date for

And the return value is a date.

Find out more

You can see it in action in the main image above, and in the video here. The full current version of the code is also shown below:

You can use our tools as a set or as a great example of how to write functions like these.

Access to SDU Tools is one of the benefits of being an SDU Insider, along with access to our other free tools and eBooks. Please just visit here for more info:

http://sdutools.sqldownunder.com

Latest version of the code

Note: the code might wrap when displayed below.

T-SQL 101: #84 Avoiding data type conversion errors with TRY_CAST, TRY_CONVERT and TRY_PARSE

In a previous post, I showed how to use CAST and CONVERT. What I didn't mention before though, is what happens when the conversion will fail. If I try to convert the string 'hello' to an int, that just isn't going to work. Of course, what does happen, is the statement returns an error.  Same thing happens if I try to convert the 30th February 2016 to a date. There aren't 30 days in February. Again, an error will be returned.

SQL Server 2012 though, added the ability to try to do a conversion, if it works, return the value, and if it fails, just return NULL. And so we got three versions of this:

TRY_CAST

TRY_CONVERT

TRY_PARSE

You can see in the example, that the first string isn't a valid date, so it returns NULL. The second one is valid, so we get back the converted value.

Now a common mistake that people make when writing T-SQL, is to write code in this type of format:

The problem with this, is that you might still get errors saying that happened_when couldn't be converted to a date. You'd think the WHERE clause would be applied first and would avoid that. But that might not happen.

Now you can deal with this in a robust way, by writing:

Even better, this would probably optimize simpler by just writing:

Learning T-SQL

It's worth your while becoming proficient in SQL. If you'd like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.