SQL: Storing the names of objects in SQL Server tables and variables

When I'm writing SQL Server scripts, I often need to store the name of SQL Server objects (like databases, schemas, tables, procedures, etc.) in variables.

That can also happen when I'm creating tables. I might need to store a SQL Server object name (like a login name or a user name) in a column of a table.

So which data type should be used? varchar(100), varchar(200), nvarchar(max), etc. etc. ??

The answer is: sysname

sysname is a built-in data type that's currently stored as nvarchar(128).

While you could use nvarchar(128), that wouldn't be a great idea. If the SQL Server team ever need to change the size of object names, you'll have a problem if you'd used nvarchar(128). If, as recommended, you'd used the sysname data type, you'll be fine.

Fixing Locking and Blocking Issues in SQL Server – Part 6 – Avoiding Deadlocks

This is part 6 in a series of posts:

  • Part 1 covered being sure there is a locking and blocking issue
  • Part 2 covered the impact of RCSI
  • Part 3 looked at the impacts of indexing on locking and blocking
  • Part 4 looked at what deadlocks really are and how SQL Server handles them
  • Part 5 looked at how applications should handle deadlocks

Today, though, I want to look at how to try to avoid deadlocks in the first place.

You Can't

Let's start with the bad news. You can't totally avoid deadlocks without doing something like serializing all access to the database.

As I mentioned in Part 5, you need to expect and handle deadlocks when they occur.

Now we have the bad news out of the way, let's look at things that can help to minimize them. Even though we make our applications capable of handling them, I want to see as few of them happening as possible.

Fix the query performance

Part one of this series talked a bit about this. I often see significant blocking issues (and associated deadlocks) completely disappear when the queries run quickly.

The longer that a query runs for, the longer it's likely to hold locks. We want to minimize that time. Indexing is your friend here.

Minimize the locking (volume and duration)

The fewer things that are blocked, and the shorter time that they're blocked, the less chance there are for deadlocks in the first place. There are four key aspects of this:

First, (no surprise), indexing is a key issue here. If you don't give SQL Server suitable indexes to let it find just the data that you need to lock, you shouldn't be surprised when it ends up locking entire partitions or tables.

Second, the design of your tables is also an issue. A properly normalized database schema is easier to work with in terms of minimizing locks.

Third, you need to use the lowest isolation level that works for your queries. Part two of this series talked about using RCSI. Often that will really help. Your biggest pain point though is going to come from applications that insist on using Serializable as an isolation level. Unfortunately, many Microsoft applications default to using this (I'm looking at you Biztalk's SQL Adapter), as do many other things in Windows like Component Services. Whenever you have a choice and it's appropriate, reduce the isolation level to the lowest you can work with.

You might be wondering about the use of NOLOCK here. Clearly it might help if you can deal with the risk. Basically, you're giving up consistency for reduced locking and performance. If you're working only with archive data that isn't changing, it might be an ok option.

If you're working with data that's being actively changed, it's a significant risk. It's hard to explain duplicated or missing rows on reports, and even harder to explain data on your report that doesn't exist in the database because it was rolled back.

I also see people trying to put NOLOCK on INSERT, UPDATE, DELETE. Here's a hint: that's not going to work.

Fourth, you need to hold locks for the shortest time that you possibly can. I'm not talking about holding locks across user interactions, that would be just plain dumb. You need to get as much work done as quickly as possible. If you want high concurrency, holding transactions across multiple round trips to the database server isn't clever, yet I see it all the time.

Don't create a proc to insert an order header, then another proc to insert an order detail line, and then call them one by one within a transaction started by the client application. Instead, pass the whole order (or even multiple orders where it makes sense) directly to a procedure on the SQL Server that does the whole job at once. Table-valued parameters are your friend here.

Locks are typically happening within transactions so you need to keep the transaction durations very short. And that means that you need to manage them properly. Start them only when needed and commit them or roll them back as soon as possible.

The transaction handling in JDBC is a great example of how not to handle transactions.

It turns on chained mode (which means that SQL Server starts a transaction automagically when a data modification occurs), and then just periodically runs a command that says "if there is a transaction, commit it". Please don't do this. At least not if you're looking for high performance and/or concurrency.

Serialize the workload

One useful, (but possibly contentious) method of reducing deadlocks, is to try to ensure that you lock objects in the database in the same order within all your logic.

I remember a site that I went to in the 1980s where they had a rule that tables were always used in alphabetical order. That sounded like a bad idea at the time, and it still sounds like a bad idea.

However, you can make a big difference to deadlocks by using tables in the same logical order every time. For example, if you acquire an order number from a sequence, then using it in an order header, then write the order detail lines, write an audit entry, etc. and keep to that same type of order, you'll have a lot less deadlocks to deal with, as it will naturally serialize the work to be done.

When all else fails

Even with all the above, you still will likely have some deadlocks. Apart from all the handling options, you might consider exerting some control over who loses when a deadlock occurs. Perhaps you have a background task that you don't mind dying and restarting. In that case, you can set its deadlock priority low so it offers itself as a victim, rather than just having SQL Server work out who's written the least to the transaction log.

In the next post, I'll look at retry logic at the server end.

 

SDU Tools: LoginTypes and UserTypes in SQL Server

I write a lot of utility code for SQL Server. Many of the system tables include values for LoginTypeID and UserTypeID but I've never found a view in SQL Server that returns a description of each of those values.

To make it easy, in our free SDU Tools for developers and DBAs, we added two views (LoginTypes and UserTypes) to help.

You can see the views in action in the image above, and in the video here:

To become an SDU Insider and to get our free tools and eBooks, please just visit here:

http://sdutools.sqldownunder.com

 

T-SQL 101: #18 – Removing duplicate rows by using DISTINCT

If I query rows from a SQL Server table, I don't always want all the rows returned. In particular, I might not want any duplicates. I can remove them by using DISTINCT.

Here's a simple example. I want a list of the sizes that products can come in, so I execute this query:

Note that although I get a list of sizes, I get a row returned for every row in the table. If I add DISTINCT to the query, look at the effect that it has:

The removal of duplicates applies to all the columns listed after DISTINCT, so if I selected Size and Color, I'd get all the combinations of Size and Color that have been used.

The Perils of DISTINCT

I often run into situations where DISTINCT is overused. What happens is that developers don't get their join operations working properly, end up with duplicates, and "fix" the query by adding DISTINCT to the SELECT clause at the top. Try to avoid this by making sure your joins are done properly first. Don't rely on DISTINCT to fix join problems.

DISTINCT vs GROUP BY

In a later post, we'll talk about GROUP BY but it's worth noting that writing:

SELECT DISTINCT Size
FROM dbo.Products
ORDER BY Size;

is really the same as writing:

SELECT Size
FROM dbo.Products
GROUP BY Size
ORDER BY Size;

Learning T-SQL

It's worth your while becoming proficient in SQL. If you'd like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.

SQL: What's the right column length for storing email addresses in a SQL Server database?

I spend a lot of time reviewing database designs at customer sites. So often, I come across bizarre decisions that have been taken in the choice of data types within the databases. One problem that I see all the time is the way that email addresses have been stored.

One data element or more?

The first thing you need to decide is whether an email address is really one data element or more.

One school of thought (based on strict normalization) is that the local part (i.e. the part in front of the @ symbol) could be stored separately to the domain part (i.e. the part after the @ symbol). I could see arguments for that, or perhaps even for separating components of the domain part, in some sort of purist view, but I don't do that.

I think of an email address as a single data element.

Data type and length?

I then see variations on whether the email address is stored as varchar or nvarchar. And, I see variations on lengths.

I see people using 60 characters, 100 characters, 150 characters, or whatever they guessed might be big enough. Some even allow 1000 characters. Sadly, I often also see max data types used (i.e. varchar(max)).  Even more sadly, I see the length defined differently in different columns within the database.

Just how long should you allow for email addresses?

There's a standard for that

The correct answer, as it is with many data modelling questions, is to see if there is a standard that defines what's needed. And of course, for email addresses, there is a standard.

RFC 5322 defines email addresses. You'll find it here.

Email addresses are made up of:

  • local part (up to 64 characters)
  • @ sign (1 character)
  • domain part (up to 255 characters)

That means that, in theory, valid email addresses could have been up to 320 characters.

However, as Brian Lawton noted in the comments to this post, there is a further restriction on the overall total. RFC 2821 (here) restricts the total length to 254, as clarified in the errata published here. Interestingly, the errata noted that they had said 255 but then clarified it to 254.

And as for varchar vs nvarchar, the RFC had different allowed characters for each part of the email address but all the allowable characters for email addresses are all contained in the varchar range.

However, since 2012 and another RFC 6530 (here), we've had the ability to use non-ASCII values in both parts. Original email servers would not have allowed values like this:

我喜欢@金屋.香港

But servers updated to handle RFC 6530, 6531, 6532 requirements do permit this.

So the appropriate format today, when using SQL Server, would be nvarchar(254).

Is that what you've allowed for in your database designs? I wish I'd applied the current standard in the design of WideWorldImporters samples for SQL Server 2016. I used nvarchar(256), so I was close.

 

 

Fixing Locking and Blocking Issues in SQL Server – Part 5 – Application Deadlock Handling

This is part 5 in a series of posts:

  • Part 1 covered being sure there is a locking and blocking issue
  • Part 2 covered the impact of RCSI
  • Part 3 looked at the impacts of indexing on locking and blocking
  • Part 4 looked at what deadlocks really are and how SQL Server handles them

Today, though, I want to look at how applications should deal with deadlocks.

Application Deadlock Handling

In the last post, I talked about what deadlocks are (under the covers) and how SQL Server "resolves" them.

In the end, the statement that is killed by the server fails and an error  (1205) gets passed back to the client application.

The problem is that most client applications aren't prepared for the error, and then just blow up themselves. We'll talk more about avoiding deadlocks next time but with a complex application and many concurrent users, you really aren't going to stop them happening.

So your application should be prepared to deal with them.

What's the correct action to take when a deadlock occurs? It's just to retry the transaction that you were trying to apply, after a short delay. Ideally, that delay will also be somewhat random and will be increasing in duration with each retry.

If deadlocks are leaking out to end users when they're running an application, the developers haven't done their jobs.

There are several situations where applications should apply retry logic. Deadlocks are just one issue.

Data Modification Logic

Most applications today though, update databases like this:

  • Start a transaction
  • Send the update
  • Commit the transaction and hope for the best

For the religious readers, perhaps praying should be part of that list as well.

But it's not good enough.

Better logic is something like:

  • While we haven't yet applied the transaction and haven't exhausted our retries
    • Start the transaction
    • Send the update
    • Commit the transaction
  • Trap any errors, and if a failure occurs and it was caused by an error that it's worth retrying, try again

Applications need to plan for failure, and then be happily surprised when the failure doesn't occur.

In the next post, I'll talk about things you can do to avoid deadlocks in the first place.

 

SDU Tools: List Disabled Indexes in SQL Server Databases

I carry out a lot of basic health checks on SQL Server databases. One issue that I come across fairly regularly is that the databases contain indexes that have been disabled.

Most of the time, when I investigate why that's happened, the customer has used some sort of ETL process and has disabled the indexes to improve loading speed. But then something went wrong, and the indexes didn't get rebuilt (i.e. re-enabled).

To make it easy for you to find any disabled indexes in a database, in our free SDU Tools for developers and DBAs, we added a stored procedure ListDisabledIndexes to help. It takes three parameters:

@DatabaseName – the database to check
@SchemaName – either ALL (default) or a comma-delimited list of schemas that should be checked
@TableName – either ALL (default) or a comma-delimited list of tables that should be checked

You can see the output columns in this image:

As well as identifying the indexes, it shows the key and included columns.

You can see the procedure in action in the image above, and in the video here:

To become an SDU Insider and to get our free tools and eBooks, please just visit here:

http://sdutools.sqldownunder.com

 

Data Modeling: a tale of a man with only one name

I've lost count of the number of computer systems that I've worked with over the years, that insist that everyone has at least two names. Yet there are many people in the community who only have a single name. I'm not just talking about people like Cher who might well have a stage name but then have another name on her passport. I'm talking about people who really only have one name.

Can your systems deal with that?

And if they only have one name, which column does it go into?

Need to be PC

In Australia, we have standards for this sort of thing. While many existing systems will use words like Surname, LastName, FirstName, ChristianName, etc. they aren't considered PC (politically correct or appropriate) now either. In our standards, the terms are:

Family Name
Given Names

Only the family name is mandatory. At Meteor (Metadata Online Registry), these are also defined as 40 characters long each. And only the family name is mandatory. That standard says:

Some people do not have a family name and a given name, they have only one name by which they are known. If the person has only one name, record it in the 'Family name' field and leave the 'Given names' field blank.

A Tale of a Disruptive Man with One Name

At a place that I previously worked, we had a guy apply for a job, and he only had one name. He'd legally changed his name to just a single word. He wanted nothing to do with his family any more and removed all reference to them. He was a pretty alternate sort of character but I so wanted him to come and work with us. He would have broken every system in the place.

But in an interesting twist of fate, he couldn't come to his job interview. The airline told us they were not prepared to allow him to fly in what he was wearing.

To this day, I can't imagine what you could wear to an interview but an airline wouldn't let you fly in it.

T-SQL 101: #17 Paginating rows returned from SQL Server T-SQL queries

When I need to display a potentially large number of rows on a screen, I often need to show them in pages. This is called paginating the rows.

For example, if I'm showing bank transactions, I might want to show 25 per page. The challenge is that I also need to be able to ask for a specific page number. So my query becomes: give me page 7 where there are 25 rows per page. In that case, I want to skip the first 150 rows, then ask for the next 25 rows.

Now I could get SQL Server to send me all the rows at once, and I could just choose which rows to display. If there are many thousands though, that could be really wasteful, mostly in transferring all that data from SQL Server to my application. And if I want to only show a single page at a time, I really don't want to get all the rows every time and then just throw most of them away.

While we could do this with T-SQL in all versions of SQL Server, it was messy. I was really pleased when SQL Server 2012 gave us an option to help with this. It came in the form of extra clauses to the ORDER BY clause.

Take a look at the following query:

It's quite elegant. It says to skip the first 150 rows, then return the next 25 rows, exactly as we wanted. To get any particular page, we just subtract one from the page, multiply by the number of rows per page, and use that as the offset:

OFFSET (@PageNumber – 1) * @RowsPerPage ROWS
FETCH NEXT @RowsPerPage ROWS ONLY

Note that this option isn't really a performance enhancement but that's a story for another day. I like the fact that it is declarative though. You are telling SQL Server what you want, not how to do it. That's almost always a good thing.

Learning T-SQL

It's worth your while becoming proficient in SQL. If you'd like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.

Fixing Locking and Blocking Issues in SQL Server – Part 4 – Deadlocks

This is part 4 in a series of posts:

  • Part 1 covered being sure there is a locking and blocking issue
  • Part 2 covered the impact of RCSI
  • Part 3 looked at the impacts of indexing on locking and blocking

Today, though, I want to look at the next potential issue, and that's deadlocks.

Deadlocks

I often hear people interchangeably using the words lock, block, and deadlock.

Locks are a standard feature of systems that are designed for concurrency. Nothing nasty to see here. When they become a problem is in two situations:

  • They are held too long and cause a blocking problem
  • Incompatible locks are held and cause deadlocks

The simplest example of a lock is when I have an item and you want it, but you must wait until I finish with it. That's just normal with sharing something.

If I finish using it quickly, you might barely even notice. But if I hang onto it for a very long time, you experience blocking. Apart from slowing you down (which might seem to be a problem to you), it's also not actually an issue. When I finish, you'll still get it.

As soon as we have more than one item though, things can get messy. If I have item A and I want item B, but you are holding item B because you also want item A, then we have a problem. That's a deadlock.

It also doesn't have to just be two people. I could have item A and want item B. Terry might have item B and want item C. Nga might have item C and want item D. Paula might have item D and want item A. A larger deadlock has occurred.

Resolving Deadlocks – the Old Way

In the past, I worked with systems that didn't resolve this automatically. What would typically happen is that two users would suddenly have their applications freeze. Then a little while later, another user needed something that one of the first two had, and her application would freeze too.

This would go on and on across the building. We used to call it creeping death syndrome.

Eventually, someone would get sick of waiting, and go to find an admin. The admin would work out which user caused the first problem and kill their process. Everyone would then come back to life.

Resolving Deadlocks – the SQL Server Way

In SQL Server, there was a conscious decision to not let this happen and to resolve it automatically. By the word "resolve" though, you need to read "kill one of the user's processes".

SQL Server has a process that looks for these situations and cleans them up. The only question, of course, was which process to kill.

In early versions, they used to kill whichever process had just put the last link the deadlock chain. That process would return an error explaining they had been chosen as a deadlock victim and roll back.

(Note: users don't take well to being told they're victims).

Over time though, this caused issues as long-running processes could collide with short-running processes and get rolled back. They could then have the same issue happen again and again.

So a decision was made to look at the amount of data that had been written to the transaction log by the processes, and kill the one that had done the least work. That worked way better.

That's how SQL Server currently handles deadlocks. In the next post, I'll talk about how you should plan to work around deadlocks.