SQL Interview: #14: Set operations using EXCEPT

This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.

Section: Development
Level: Medium

Question:

UNION and UNION ALL are commonly used to combine two sets of rows into a single set of rows.

EXCEPT is another set operator.

Can you explain what it does?

Answer:

EXCEPT is used to remove any rows in the first set of rows, if the same rows appear in the second set.

For example, in the code below:

The query returns all the Trading Names for customers unless a supplier also has that same name.

In other database engines (e.g. Oracle), this operator is called MINUS.

Note: For EXCEPT to work, the queries must have the same number of columns, and compatible data types. The values must also be "comparable". For example, with strings, that means compatible collations, and it means that only comparable data types are supported i.e. spatial data types would not be, as they are not "comparable".

SQL Interview: #13: Impact of optimize for adhoc workloads

This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.

Section: Server Tuning
Level: Advanced

Question:

SQL Server 2008 introduced a server option for optimize for adhoc workloads.

Can you explain the problem that it is designed to solve, what most commonly causes the problem, and what impact this option has when you enable it?

Answer:

On a SQL Server, you can end up with a large amount of memory being taken up by query plans that will likely never be used again.

There are two primary causes of this situation:

  • The less common cause is that you have a large number of adhoc queries being executed as one-off queries.
  • The most common cause is that you are using a framework like LINQ that does not handle data typing properly, and causes a large number of query plans to be created for each query. You might also be using lower-level frameworks like ADO.NET incorrectly.

As an example, command objects in ADO.NET have a parameters collection. If you add parameters using the AddWithValue() method, you specify the parameter name, and the value, but you do not specify the data type. The problem with this is that the framework then tries to work out what the data type is, from the value.

This means that if you pass a string like 'Hello', then it might guess nvarchar(5) but if you pass a string like 'Hello There', it might guess nvarchar(11).

The lengths of data types are part of the signature for each query plan. This means that you can easily end up with different query plans for every combination of every length of string that has ever been passed to the query. We call this Plan Cache Pollution.

The correct way to fix this is to avoid ever using methods like AddWithValue() and instead using a method where you specify the data type. The challenge here is that many frameworks like LINQ have this behaviour baked in, and the developer cannot change it.

When the optimize for adhoc workloads option has been set, the first time a query is seen, the hash for the query is stored (so it can be remembered) but the query plan is not stored. The next time the same query is seen, the compiled plan is then stored for reuse.

This avoids the plan cache becoming littered with one-off query plans.

Another option taken by some frameworks, is to just use varchar(4000) or nvarchar(4000), or worse nvarchar(max) for all strings. That will of course avoid the plan cache pollution, but it's likely to then cause issues with memory grants in poor quality plans.

 

SQL Interview: #12: Using UNION vs UNION ALL

This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.

Section: Development
Level: Intro

Question:

Consider the following code:

A: What would be the difference in results if the query was changed to use a UNION ALL instead of the UNION?

B: What are the performance differences between using a UNION ALL and a UNION in a SELECT statement?

C: How could this query be rewritten without the UNION?

Answer:

A: If the query was changed to use UNION ALL, a customer might be returned more than once. A UNION performs a DISTINCT operation on the results of the query. A UNION ALL does not perform the DISTINCT operation on the results.

B: A UNION invariably involves more work and is slower as it needs to perform a DISTINCT operation on the results.

C: One option would be to rewrite it as follows:

 

SQL Interview: #11 Adding a column in the middle of a table's columns

This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.

Section: Development
Level: Medium

Question:

When you add a column to an existing SQL Server table, the column is added to the end of the table. The column will end up with the highest column_id.

If a developer asks you how a column can be added to the middle of a table instead, what is your advice and how would you approach the request?

Answer:

The first part of the advice is that ideally you won't care what order the columns are in your table. However, I completely understand that a developer might like to see columns grouped appropriately in a list of columns for a table, rather than just a random list of columns.

Unfortunately, SQL Server does not have an option like MySQL (for example), where you can add "AFTER", "FIRST", or "LAST" when defining columns.

With SQL Server, the general approach is to:

  • Copy the data out to a temporary table
  • Drop any foreign key constraints that reference the table
  • Drop and recreate the table with the desired order
  • Copy the data back in from the temporary table
  • Recreate any foreign key constraints that reference the table
  • Drop the temporary table

 

 

SQL Interview: #10 System-defined default names

This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.

Section: Development
Level: Medium

Question:

When you define a column default with code like below:

the system will define the name of the default. Can you give examples of why specifying the name of the default instead of letting the system supply it would be a good practice?

Answer:

There are several reasons. Here are three:

If you ever need to change the default value, you will need to know the name of the default constraint to be able to remove it, before you add a new default. This is much easier if you already know the name of the default.

If you ever need to drop the column, in SQL Server, you must first drop any default on the column. Again, that is much easier if you already know the name of the default. (Note that other database engines like PostgreSQL do not allow you to name defaults, but they also automatically drop them when dropping columns).

If you have created the table in multiple databases and you are using database comparison tools to check for differences, having consistent names in the script avoids the potential detection of a difference. (Note that some comparison tools can ignore system names like these).

 

SQL Interview: #9: Computed columns in table definitions

This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.

Section: Development
Level: Medium

Question:

Consider the following code:

Will the CreatedDate column return the same value in both SELECT statements?

Answer:

When you define a computed column, the value is calculated when it is SELECTed. In this case, the two SELECT operations will occur at different times, and different values will be returned for that column.

It is possible with some computed columns to add the term PERSISTED. In that case, the value is calculated at INSERT or UPDATE but is then stored. The same value would be returned every time it is SELECTed.

However, in this case, you cannot apply PERSISTED to the computed column, as the expression SYSDATETIME() is not deterministic.

 

SQL Interview: #8: Why should you avoid the use of db_datareader and db_datawriter?

This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.

Section: Security
Level: Intro

Question:

Many applications connect to SQL Server databases with users that are members of the db_datareader and/or db_datawriter roles?

Why should you avoid using those roles?

Answer:

db_datareader and db_datawriter are fixed database roles that were provided for convenience. They aren't a good option from a security perspective.

Whenever you add a user (or group) to any fixed role, you are assigning them a collection of permissions. By definition, that list of permissions will need to include at least all the permissions they require.

However, invariably when you use fixed roles, you are also assigning additional permissions or access to objects that aren't required.

A more secure solution is to create a role and grant it only the permissions that are required, and then assign the user (or group) to that new role.

 

SQL Interview #7: Are statement terminators ever required in T-SQL?

This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.

Section: Development
Level: Intro

Question:

Many versions of SQL require every SQL statement to be terminated with a semicolon.

Since it was created, T-SQL has listed statement terminators as optional.

Are there any situations where they are required, and not optional?

Answer:

Since SQL Server 2005, some T-SQL statements must be separated from previous statements by a statement terminator.

The statements involved are WITH, SEND, RECEIVE.

In addition, a MERGE statement must have a statement terminator.

Using statement terminators is generally considered good practice, even though many of the Microsoft-supplied tools generate scripts without them.

Ever since SQL Server 2005, the product deprecation list has stated that the optional nature of statement terminators is deprecated. For this reason alone, you should use statement terminators.

But there are other potential issues. Consider this nasty example from fellow MVP Erland Sommarskog:

What is not immediately obvious is that THROW would never be executed. Instead, it would be interpreted as a column alias for ERROR_MESSAGE().

Using semicolons as statement terminators avoids this type of issue.

 

SQL Interview: #6: Multi-row INSERT operations

This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.

Section: Development
Level: Intro

Question:

Look at the following multi-row INSERT statement:

The column CustomerGroupName is defined as NOT NULL so the second row cannot be inserted.

How many rows are inserted by this statement, assuming there are no other errors?

Answer:

INSERT statements are atomic, even for multi-row INSERT statements. Either all the rows are inserted, or none are.

If one row fails (as in this case), no rows are inserted.

 

SQL Interview: #5: System defined primary key names

This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.

Section: Development
Level: Intro

Question:

If you create a table using the following code, what would the name of the primary key constraint look like?

How could you specify the name?

Can you suggest any advantages of providing a name?

Answer:

If you don't specify the name of a primary key, the system will allocate a name similar to this:

PK__Customer__9AA3001A15FDE023

It's basically PK some underscores, part of the table name, some more underscores, and part of a GUID string, chosen to be unique.

To specify the name, put CONSTRAINT and the name before PRIMARY KEY like this:

Some advantages of providing specific names are:

  • A specific name is more meaningful. It's not obvious from the system-generated name above, even which table it's associated with.
  • If you ever need to modify the constraint, you'll need to know its name. It makes it much easier to write scripts when you know the name, instead of having to write code to look it up each time.
  • If the table is recreated, it will usually end up with a different name for the constraint. If you use database comparison tools, the name might be flagged as a difference between databases created with the same script. (Note that some database comparison tools have options to ignore system-generated names).