SQL Interview: #9: Computed columns in table definitions

This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.

Section: Development
Level: Medium

Question:

Consider the following code:

Will the CreatedDate column return the same value in both SELECT statements?

Answer:

When you define a computed column, the value is calculated when it is SELECTed. In this case, the two SELECT operations will occur at different times, and different values will be returned for that column.

It is possible with some computed columns to add the term PERSISTED. In that case, the value is calculated at INSERT or UPDATE but is then stored. The same value would be returned every time it is SELECTed.

However, in this case, you cannot apply PERSISTED to the computed column, as the expression SYSDATETIME() is not deterministic.

 

T-SQL 101: #94 Returning messages to the client by using PRINT and RAISERROR in T-SQL

There are times when you're writing in any programming language, that you want to output messages back to the client. In T-SQL, the statement provided for that is the PRINT statement.

PRINT is simple enough to use. You give it a  string to return and it prints that string.

It's different to the SELECT statement, because the SELECT statement returns a rowset (i.e. a set of rows).

In the main image above, you can see a script with both a PRINT and a SELECT. When that script is executed, two things happen. The output of the SELECT statement is returned in the Results tab in SQL Server Management Studio as below:

But notice that the PRINT output isn't there. Instead, it's returned in the Messages tab as below:

Using RAISERROR instead of PRINT

PRINT is actually a specific instance of the RAISERROR statement. Instead of using PRINT, you could use RAISERROR like this:

Notice that the output is identical. The first parameter to RAISERROR is the message, the second is the severity, and the third is the state. In this case, any severity from 1 to 10 would have worked. You can just make the state be 1 and ignore it until we discuss RAISERROR in much more detail later.

RAISERROR and NOWAIT

So why would you ever use RAISERROR instead of PRINT? After all, PRINT is simpler.

The answer is related to when the messages are returned. For example, if you put PRINT statements in a stored procedure, it's only after the stored procedure completes that you'll see the output.

If however, you use the NOWAIT option in RAISERROR, the messages come back immediately.

This can be very useful when you are debugging code.

Learning T-SQL

It's worth your while becoming proficient in SQL. If you'd like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.

SQL Interview: #8: Why should you avoid the use of db_datareader and db_datawriter?

This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.

Section: Security
Level: Intro

Question:

Many applications connect to SQL Server databases with users that are members of the db_datareader and/or db_datawriter roles?

Why should you avoid using those roles?

Answer:

db_datareader and db_datawriter are fixed database roles that were provided for convenience. They aren't a good option from a security perspective.

Whenever you add a user (or group) to any fixed role, you are assigning them a collection of permissions. By definition, that list of permissions will need to include at least all the permissions they require.

However, invariably when you use fixed roles, you are also assigning additional permissions or access to objects that aren't required.

A more secure solution is to create a role and grant it only the permissions that are required, and then assign the user (or group) to that new role.

 

SQL Interview #7: Are statement terminators ever required in T-SQL?

This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.

Section: Development
Level: Intro

Question:

Many versions of SQL require every SQL statement to be terminated with a semicolon.

Since it was created, T-SQL has listed statement terminators as optional.

Are there any situations where they are required, and not optional?

Answer:

Since SQL Server 2005, some T-SQL statements must be separated from previous statements by a statement terminator.

The statements involved are WITH, SEND, RECEIVE.

In addition, a MERGE statement must have a statement terminator.

Using statement terminators is generally considered good practice, even though many of the Microsoft-supplied tools generate scripts without them.

Ever since SQL Server 2005, the product deprecation list has stated that the optional nature of statement terminators is deprecated. For this reason alone, you should use statement terminators.

But there are other potential issues. Consider this nasty example from fellow MVP Erland Sommarskog:

What is not immediately obvious is that THROW would never be executed. Instead, it would be interpreted as a column alias for ERROR_MESSAGE().

Using semicolons as statement terminators avoids this type of issue.

 

SQL Interview: #6: Multi-row INSERT operations

This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.

Section: Development
Level: Intro

Question:

Look at the following multi-row INSERT statement:

The column CustomerGroupName is defined as NOT NULL so the second row cannot be inserted.

How many rows are inserted by this statement, assuming there are no other errors?

Answer:

INSERT statements are atomic, even for multi-row INSERT statements. Either all the rows are inserted, or none are.

If one row fails (as in this case), no rows are inserted.

 

T-SQL 101: #93 Restarting row numbering by using PARTITION BY

Imagine that I've used ROW_NUMBER to number all the cinemas in my database. I'll get values from 1 to the maximum number of cinemas. But what if I want to number the cinemas within each city? i.e. Aberdeen has three cinemas, so number them 1, 2, and 3. But when we get to the next city, start the numbering again. We can do this by adding PARTITION BY to the OVER clause.

PARTITION BY and OVER can do many, many things and I don't want to get too far into complexity on this in today's post, but I do want to mention that when you have a window function like ROW_NUMBER, RANK, DENSE_RANK, and NTILE, instead of just returning one large set of values, you can partition the values by using PARTITION BY.

As another simple example, imagine I need a row number for each order line in an order. I can just partition by the order number.

It's not just row numbering though. With NTILE, I could allocate the results of exams into bands, but partition the entire set by which class the students are in, or by which exam they took.

Partitioning the data into groups can be very useful.

Learning T-SQL

It's worth your while becoming proficient in SQL. If you'd like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.

SQL Interview: #5: System defined primary key names

This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.

Section: Development
Level: Intro

Question:

If you create a table using the following code, what would the name of the primary key constraint look like?

How could you specify the name?

Can you suggest any advantages of providing a name?

Answer:

If you don't specify the name of a primary key, the system will allocate a name similar to this:

PK__Customer__9AA3001A15FDE023

It's basically PK some underscores, part of the table name, some more underscores, and part of a GUID string, chosen to be unique.

To specify the name, put CONSTRAINT and the name before PRIMARY KEY like this:

Some advantages of providing specific names are:

  • A specific name is more meaningful. It's not obvious from the system-generated name above, even which table it's associated with.
  • If you ever need to modify the constraint, you'll need to know its name. It makes it much easier to write scripts when you know the name, instead of having to write code to look it up each time.
  • If the table is recreated, it will usually end up with a different name for the constraint. If you use database comparison tools, the name might be flagged as a difference between databases created with the same script. (Note that some database comparison tools have options to ignore system-generated names).

 

SQL Interview: #4: INSERT statements and column lists

This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.

Section: Development
Level: Medium

Question:

When you are writing an INSERT statement in T-SQL, the column list is usually optional.

Why would including it be a good idea? When would it ever be required (i.e. not optional)?

Answer:

If you don't include a column list, SQL Server will try to match up the values you are inserting, with the list of columns in the table. It will do that in column_id order (based on what you see in sys.columns) and it will ignore any columns that are auto-generated like IDENTITY columns, computed columns, and rowversion columns.

Sometimes you aren't wanting  to insert all the other columns. Using a column list allows you to specify just the columns that you are inserting.

This is also a good idea as it produces less fragile code. If the table gets redefined with a different order for the columns, your INSERT statement would still work if you have listed the columns you are inserting, in the order that you are providing them.

Column lists are usually optional but recommended, but they are required when you are inserting into a table with SET IDENTITY_INSERT set to ON.

 

T-SQL 101: #92 Categorizing output rows by using NTILE

I've been talking about the basic window functions in T-SQL and one that's not well known but is surprisingly useful is NTILE.

I'm not sure on the name but it's probably short for percentile. I don't know why they didn't call it a slightly more meaningful name, but what it says is take the output and break it up into bands or chunks of data.

So if I say NTILE(10), that'll give me a tenth of the rows with a value of 1, another tenth of the rows with a value of 2, and so on.

This is actually really, really useful if you're trying to do things like populate a graph with the bottom 10%, the next 10% and so on. This function would return you the required tiling for that graph i.e. which band each value needs to go into. In the example, I've called it customer band, but I've just gone through CinemaID order and so they start all being number one. Then there would have been a tenth of them with number 2. A tenth of them with number 3 and so on, and the end of the list is shown on the right. There the the last ones will have the value 10.

This can be very useful.

Learning T-SQL

It's worth your while becoming proficient in SQL. If you'd like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.

SQL Interview: #3: Are bit columns useful in indexes?

This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.

Section: Query Performance
Level: Medium

Question:

In SQL Server a column that is defined with a bit data type can only have the values 0, or 1, or it can be NULL if the column allows it.

Given such a small number of possible values, are bit columns ever useful in indexes? Can you give an example of when they might or might not be useful?

Answer:

The range of potential values for a data type is not the issue. The selectivity of the values is critical.

So while an index that includes a value that's evenly distributed (i.e. it's zero in half the rows and one in the other half), is unlikely to be useful, a highly skewed distribution of values can make them very useful.

For example, a query that's finding a hundred incomplete tasks amongst millions of completed tasks, will most certainly find the column useful in an index. It's also an example of what would potentially be useful in a filtered index.

For a more complete discussion, see this previous blog post.