SQL Interview: #5: System defined primary key names

This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.

Section: Development
Level: Intro

Question:

If you create a table using the following code, what would the name of the primary key constraint look like?

How could you specify the name?

Can you suggest any advantages of providing a name?

Answer:

If you don't specify the name of a primary key, the system will allocate a name similar to this:

PK__Customer__9AA3001A15FDE023

It's basically PK some underscores, part of the table name, some more underscores, and part of a GUID string, chosen to be unique.

To specify the name, put CONSTRAINT and the name before PRIMARY KEY like this:

Some advantages of providing specific names are:

  • A specific name is more meaningful. It's not obvious from the system-generated name above, even which table it's associated with.
  • If you ever need to modify the constraint, you'll need to know its name. It makes it much easier to write scripts when you know the name, instead of having to write code to look it up each time.
  • If the table is recreated, it will usually end up with a different name for the constraint. If you use database comparison tools, the name might be flagged as a difference between databases created with the same script. (Note that some database comparison tools have options to ignore system-generated names).

 

SQL Interview: #4: INSERT statements and column lists

This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.

Section: Development
Level: Medium

Question:

When you are writing an INSERT statement in T-SQL, the column list is usually optional.

Why would including it be a good idea? When would it ever be required (i.e. not optional)?

Answer:

If you don't include a column list, SQL Server will try to match up the values you are inserting, with the list of columns in the table. It will do that in column_id order (based on what you see in sys.columns) and it will ignore any columns that are auto-generated like IDENTITY columns, computed columns, and rowversion columns.

Sometimes you aren't wanting  to insert all the other columns. Using a column list allows you to specify just the columns that you are inserting.

This is also a good idea as it produces less fragile code. If the table gets redefined with a different order for the columns, your INSERT statement would still work if you have listed the columns you are inserting, in the order that you are providing them.

Column lists are usually optional but recommended, but they are required when you are inserting into a table with SET IDENTITY_INSERT set to ON.

 

T-SQL 101: #92 Categorizing output rows by using NTILE

I've been talking about the basic window functions in T-SQL and one that's not well known but is surprisingly useful is NTILE.

I'm not sure on the name but it's probably short for percentile. I don't know why they didn't call it a slightly more meaningful name, but what it says is take the output and break it up into bands or chunks of data.

So if I say NTILE(10), that'll give me a tenth of the rows with a value of 1, another tenth of the rows with a value of 2, and so on.

This is actually really, really useful if you're trying to do things like populate a graph with the bottom 10%, the next 10% and so on. This function would return you the required tiling for that graph i.e. which band each value needs to go into. In the example, I've called it customer band, but I've just gone through CinemaID order and so they start all being number one. Then there would have been a tenth of them with number 2. A tenth of them with number 3 and so on, and the end of the list is shown on the right. There the the last ones will have the value 10.

This can be very useful.

Learning T-SQL

It's worth your while becoming proficient in SQL. If you'd like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.

SQL Interview: #3: Are bit columns useful in indexes?

This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.

Section: Query Performance
Level: Medium

Question:

In SQL Server a column that is defined with a bit data type can only have the values 0, or 1, or it can be NULL if the column allows it.

Given such a small number of possible values, are bit columns ever useful in indexes? Can you give an example of when they might or might not be useful?

Answer:

The range of potential values for a data type is not the issue. The selectivity of the values is critical.

So while an index that includes a value that's evenly distributed (i.e. it's zero in half the rows and one in the other half), is unlikely to be useful, a highly skewed distribution of values can make them very useful.

For example, a query that's finding a hundred incomplete tasks amongst millions of completed tasks, will most certainly find the column useful in an index. It's also an example of what would potentially be useful in a filtered index.

For a more complete discussion, see this previous blog post.

 

SQL Interview: #2: NULL and NOT NULL when defining Columns in Tables

This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.

Section: Development
Level: Medium

Question:

When you are writing a CREATE TABLE statement to define a new table, after each column, you can write NULL or NOT NULL, but doing that is optional.

Why is it a good practice to include it, and what happens if you leave it out? Are the columns defined as NULL or as NOT NULL? Are there any exceptions to this?

Answer:

If you leave it out NULL and NOT NULL in a column definition, many factors determine how the column will be defined. For system-defined data types, SQL Server uses a combination of ANSI_NULL_DFLT_ON SET option and the ANSI_NULL_DEFAULT_ON database option to determine which value to use.

There are several exceptions. For a few examples, a column declared as part of a PRIMARY KEY will bet set to NOT NULL regardless of settings. SPARSE columns will always be set to NULL. For CLR types and alias data types, the NULLability is determined by the data type.

It's always a good practice to write NULL or NOT NULL in the table definition.

SQL: UNMASK is now granular in Azure SQL

Dynamic Data Masking was added to SQL Server back in 2016. I've almost never used it since. Why? There were two reasons:

#1: The first reason was that it didn't work in conjunction with other security features. In particular, you couldn't use dynamic data masking in conjunction with Always Encrypted. From the start, I've been telling the product group that this is a misstep. There's very little data that I'd want to hide from my own staff, that I'm happy still being visible to a hosting provider.

This is still an issue today. Some users might just want to use Dynamic Data Masking though, without Always Encrypted, so there might be a use case there, particularly with on-premises systems.

#2: (And this was the big one) UNMASK wasn't granular enough. There was a single UNMASK permission for the whole database. That means it worked nothing like other permissions in the database. We can usually set permissions on schemas (my favourite), tables, and columns. And I didn't see UNMASK as any different. Again, a number of us made that very clear to the product team.

However, this has now changed !

I wanted to call attention to a recent blog post that just mentioned in passing that UNMASK was now granular. In fact, you can apply it like before, to the whole DB, but you also apply it at the schema, table, and/or column levels.

Well at least you can in Azure SQL Database (and Synapse Analytics). One of the things I love about working with Azure SQL Database is being at the front of the queue when T-SQL and DB enhancements are made.

This is great news. Thank you to the product team !

T-SQL 101: #91 Determining positions in a set by using RANK and DENSE_RANK

In my last T-SQL 101 post, I mentioned ROW_NUMBER. It let you put a row number or position beside each row that was returned. Sometimes though, you want a rank instead. A rank is similar but it's like a position in a race.

In the example above, I've used RANK to produce an ordering, based on an alphabetical listing of city names. Notice there's Abercorn Abercorn Abercorn and then Aberdeen. So, like in a race, if three people came first they all get the value 1. The next person is fourth. Three people came forth, so then the next one is 7th, and so on.

Dense Rank

That's the most common form of output but occasionally, people want the same thing, but without gaps. DENSE_RANK is like RANK, but it doesn't have gaps.

This is almost like getting to come second in a race, even though three people came first. So the idea here is that Abercorn there's three of them, and they get the value 1. Then there's Aberdeen. There's three of those, so they all get the value 2. Abminga comes next, so it gets 3. And so on.

Learning T-SQL

It's worth your while becoming proficient in SQL. If you'd like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.

SQL Interview: #1: db_datareader and temporary tables

This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.

Section: Security
Level: Medium

Question:

I have an application that connects to the database as a user called ReaderApp. The only role that ReaderApp has been added to is db_datareader. The application was previously only using SELECT statements to read data that was required.

For performance reasons, the development team now wants the application to use temporary tables in their queries. They need to be able to create temporary tables, insert data into them, read the data back from them, and drop them..

What additional permissions (if any) need to be assigned to ReaderApp, or which additional roles (if any) should ReaderApp be assigned to?

Answer:

None. All users who have access to the database have permission to work with temporary tables.

SQL Interview: Starting a new series of blog posts

Many clients hire me to carry out job interviews on their behalf. If they're hiring someone for a data-related role, they want to make sure they're hiring the right person. That usually means two things:

  • The person will fit well within the culture of the organization
  • The person is technically competent.

I can't help with the first one. If I've worked with the client for a long time, I might have a pretty good idea but ultimately, that's something the client needs to decide.

But I'm well-placed to help them with the second part, and that's the part that the client often doesn't feel they can do well themselves.

Note: It's not the point of this post, but if you'd like to get me involved in your interviews, please just reach out.

Technical interviews

For the technical interviews, it's not just a basic knowledge test. I aim to work out two things:

  • Does the candidate actually understand the technology that they claim to, and at which level of expertise?
  • How well do they approach problems that they don't immediately know the answer to?

For their knowledge understanding, I'll start with fairly easy questions so they get comfortable, then use increasing depth to probe different areas, to judge their real understanding in those areas.

For the problem solving, I'll give them scenarios, ask if they have any immediate ideas on potential solutions, then dive into how they'd try to solve it. They can tell me what they'd check, and I tell them what they would have found if they did.

Problem Solving Ability

I'm keen to see how logically they think and how they approach problem solving. I wish I could say that the ability to think logically can easily be learned, but I'm afraid my experience tells me that's just not true.

Back when I worked for HP in the mid 80's, they sent us on a Kepner-Tregoe course on logical troubleshooting. I'm sure it seemed like a great idea, and the course was well constructed. But the people in my class who could logically troubleshoot at the start, were the same ones who could do it at the end. It's great to have a core methodology. But to me, it's all about the ability to think logically, and that takes longer than a week to learn. However, it's a critical skill when getting to the bottom of problems.

Blog Series

I decided to make a blog post series that covers a lot of questions that check core understanding of SQL Server topics. I'll include questions at different levels, and on a variety of SQL Server related topics.

I'm not talking about tricky or gotcha questions, or anything like exam prep questions. Just checks on core understanding.

BTW: I've added a new category called SQL Interview. So you can easily find any/all of them amongst my other posts by using the category filter on the blog. I've also got a series of Business Intelligence questions (focussing on Power BI, Tabular data models, Data Factory, etc.) being prepared.

I hope you enjoy reading these. Nothing is ever absolute in this industry, so if you disagree with any of them, feel free to comment and say why.

 

T-SQL 101: #90 Numbering output rows by using ROW_NUMBER

In SQL Server 2000 and earlier versions, I often heard people ask "How do I output a row number beside each row that's output in my query?"

I remember some people arguing that it wasn't a valid request, as it didn't feel "set-based" but it was an appropriate request, and it could be dealt with in a set-based manner. Sometimes it's very, very useful to be able to do that.

In SQL Server 2005, we got the ROW_NUMBER function, and it did just what people were asking for. You got your normal query output, but also got a column with 1 for the first row, 2 for the second, and so on.

What's really good about this function though, is that it has an OVER clause that lets you specify the order for the numbering, separate to the order for the query. In the example shown above, I've ordered the output rows by CityName, and I've also ordered the row numbers by CityName. But I didn't have to. I could have said this:

That would have created row numbers based on the order of the CinemaID but the query would have output rows still in CityName order.

Note also that there was nothing special about the column alias (RowNumber) that I used. That could have been any name.

ROW_NUMBER was a powerful addition to T-SQL, but it was only the first of a set of window functions that were added to the language. We'll see more in later posts.

Learning T-SQL

It's worth your while becoming proficient in SQL. If you'd like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.