SQL: Storing the names of objects in SQL Server tables and variables

When I'm writing SQL Server scripts, I often need to store the name of SQL Server objects (like databases, schemas, tables, procedures, etc.) in variables.

That can also happen when I'm creating tables. I might need to store a SQL Server object name (like a login name or a user name) in a column of a table.

So which data type should be used? varchar(100), varchar(200), nvarchar(max), etc. etc. ??

The answer is: sysname

sysname is a built-in data type that's currently stored as nvarchar(128).

While you could use nvarchar(128), that wouldn't be a great idea. If the SQL Server team ever need to change the size of object names, you'll have a problem if you'd used nvarchar(128). If, as recommended, you'd used the sysname data type, you'll be fine.

Fixing Locking and Blocking Issues in SQL Server – Part 6 – Avoiding Deadlocks

This is part 6 in a series of posts:

  • Part 1 covered being sure there is a locking and blocking issue
  • Part 2 covered the impact of RCSI
  • Part 3 looked at the impacts of indexing on locking and blocking
  • Part 4 looked at what deadlocks really are and how SQL Server handles them
  • Part 5 looked at how applications should handle deadlocks

Today, though, I want to look at how to try to avoid deadlocks in the first place.

You Can't

Let's start with the bad news. You can't totally avoid deadlocks without doing something like serializing all access to the database.

As I mentioned in Part 5, you need to expect and handle deadlocks when they occur.

Now we have the bad news out of the way, let's look at things that can help to minimize them. Even though we make our applications capable of handling them, I want to see as few of them happening as possible.

Fix the query performance

Part one of this series talked a bit about this. I often see significant blocking issues (and associated deadlocks) completely disappear when the queries run quickly.

The longer that a query runs for, the longer it's likely to hold locks. We want to minimize that time. Indexing is your friend here.

Minimize the locking (volume and duration)

The fewer things that are blocked, and the shorter time that they're blocked, the less chance there are for deadlocks in the first place. There are four key aspects of this:

First, (no surprise), indexing is a key issue here. If you don't give SQL Server suitable indexes to let it find just the data that you need to lock, you shouldn't be surprised when it ends up locking entire partitions or tables.

Second, the design of your tables is also an issue. A properly normalized database schema is easier to work with in terms of minimizing locks.

Third, you need to use the lowest isolation level that works for your queries. Part two of this series talked about using RCSI. Often that will really help. Your biggest pain point though is going to come from applications that insist on using Serializable as an isolation level. Unfortunately, many Microsoft applications default to using this (I'm looking at you Biztalk's SQL Adapter), as do many other things in Windows like Component Services. Whenever you have a choice and it's appropriate, reduce the isolation level to the lowest you can work with.

You might be wondering about the use of NOLOCK here. Clearly it might help if you can deal with the risk. Basically, you're giving up consistency for reduced locking and performance. If you're working only with archive data that isn't changing, it might be an ok option.

If you're working with data that's being actively changed, it's a significant risk. It's hard to explain duplicated or missing rows on reports, and even harder to explain data on your report that doesn't exist in the database because it was rolled back.

I also see people trying to put NOLOCK on INSERT, UPDATE, DELETE. Here's a hint: that's not going to work.

Fourth, you need to hold locks for the shortest time that you possibly can. I'm not talking about holding locks across user interactions, that would be just plain dumb. You need to get as much work done as quickly as possible. If you want high concurrency, holding transactions across multiple round trips to the database server isn't clever, yet I see it all the time.

Don't create a proc to insert an order header, then another proc to insert an order detail line, and then call them one by one within a transaction started by the client application. Instead, pass the whole order (or even multiple orders where it makes sense) directly to a procedure on the SQL Server that does the whole job at once. Table-valued parameters are your friend here.

Locks are typically happening within transactions so you need to keep the transaction durations very short. And that means that you need to manage them properly. Start them only when needed and commit them or roll them back as soon as possible.

The transaction handling in JDBC is a great example of how not to handle transactions.

It turns on chained mode (which means that SQL Server starts a transaction automagically when a data modification occurs), and then just periodically runs a command that says "if there is a transaction, commit it". Please don't do this. At least not if you're looking for high performance and/or concurrency.

Serialize the workload

One useful, (but possibly contentious) method of reducing deadlocks, is to try to ensure that you lock objects in the database in the same order within all your logic.

I remember a site that I went to in the 1980s where they had a rule that tables were always used in alphabetical order. That sounded like a bad idea at the time, and it still sounds like a bad idea.

However, you can make a big difference to deadlocks by using tables in the same logical order every time. For example, if you acquire an order number from a sequence, then using it in an order header, then write the order detail lines, write an audit entry, etc. and keep to that same type of order, you'll have a lot less deadlocks to deal with, as it will naturally serialize the work to be done.

When all else fails

Even with all the above, you still will likely have some deadlocks. Apart from all the handling options, you might consider exerting some control over who loses when a deadlock occurs. Perhaps you have a background task that you don't mind dying and restarting. In that case, you can set its deadlock priority low so it offers itself as a victim, rather than just having SQL Server work out who's written the least to the transaction log.

In the next post, I'll look at retry logic at the server end.

 

SDU Tools: LoginTypes and UserTypes in SQL Server

I write a lot of utility code for SQL Server. Many of the system tables include values for LoginTypeID and UserTypeID but I've never found a view in SQL Server that returns a description of each of those values.

To make it easy, in our free SDU Tools for developers and DBAs, we added two views (LoginTypes and UserTypes) to help.

You can see the views in action in the image above, and in the video here:

To become an SDU Insider and to get our free tools and eBooks, please just visit here:

http://sdutools.sqldownunder.com

 

Book Review: The Little Book of Luck

One of the things that I love about digital books and audio books is how quickly I can go from a friend talking about one, to actually having it. This book is one of those. I can't actually remember who recommended this one but I recall looking it up immediately and purchasing it. It's The Little Book of Luck by Richard Wiseman.

Wiseman is a professor for public understanding of psychology in the UK. He states his interests as "unusual areas including deception, luck, humour, and the paranormal".

I don't share his interest in the paranormal. While it can mean things that are just quite out of the ordinary, the common usage of paranormal now tends to be things that aren't able to easily explained (at least right now), and that seem to break things like the laws of nature. I don't see any concrete evidence for any of those.

However, this book tackled "luck".

I've spent quite a bit of time in Asian communities over the last decade and "luck" is something that all these communities seem to have a profound belief in, and seek out. Try selling a house number 24 to a Chinese person compared to selling them a house number 8. Lots of "luck" with that.

Clearly, I don't believe that "luck" is a thing. And while Wiseman doesn't come out and say that directly, he does give plenty of hints that what appears to be luck is often just a positive view of something that's happened.

I like the way that the book talks about ways to view the world, and basically, building an optimistic outlook all the time.

It's only a short book and while you could read the whole thing in half an hour if you rushed through it, it's worth spending some time contemplating his ideas.

Bottom line?

I didn't expect to but I did enjoy reading this small book,  and it's a pretty easy read. I was really expecting more but it might be just the thing to brighten your day.

Greg's rating: 6 out of 10

Note: as an Amazon Associate I earn (a pittance) from qualifying purchases but whether or not I recommend a book is unrelated to this. One day it might just help cover some of my site costs. (But given the rate, that's not really likely anyway).

T-SQL 101: #18 – Removing duplicate rows by using DISTINCT

If I query rows from a SQL Server table, I don't always want all the rows returned. In particular, I might not want any duplicates. I can remove them by using DISTINCT.

Here's a simple example. I want a list of the sizes that products can come in, so I execute this query:

Note that although I get a list of sizes, I get a row returned for every row in the table. If I add DISTINCT to the query, look at the effect that it has:

The removal of duplicates applies to all the columns listed after DISTINCT, so if I selected Size and Color, I'd get all the combinations of Size and Color that have been used.

The Perils of DISTINCT

I often run into situations where DISTINCT is overused. What happens is that developers don't get their join operations working properly, end up with duplicates, and "fix" the query by adding DISTINCT to the SELECT clause at the top. Try to avoid this by making sure your joins are done properly first. Don't rely on DISTINCT to fix join problems.

DISTINCT vs GROUP BY

In a later post, we'll talk about GROUP BY but it's worth noting that writing:

SELECT DISTINCT Size
FROM dbo.Products
ORDER BY Size;

is really the same as writing:

SELECT Size
FROM dbo.Products
GROUP BY Size
ORDER BY Size;

Learning T-SQL

It's worth your while becoming proficient in SQL. If you'd like to learn a lot about T-SQL in a hurry, our Writing T-SQL Queries for SQL Server course is online, on-demand, and low cost.

Learning Mandarin: Personal Pronouns and Adjectives

In English, we have personal pronouns and adjectives like:

I, we, you, your, our, my

So what are the equivalents in Mandarin?

(Wǒ) means "I". It's pronounced pretty much like the English "war". However, in English, a possessive form is my. In Mandarin, we use the same word as for I, but add on an indication that it's possessive.

我的 (Wǒ de) means "my". The de is pronounced a bit like "da" and

我的朋友 (Wǒ de péngyǒu) means "my friend".

To make it plural, there is another qualifier that we can add.

我们 (Wǒmen) is "we".

And just like before, we add to make it possessive.  So

我们的 (Wǒmen de) means "our" and

我们的朋友 (Wǒmen de péngyǒu) means "our friend"

The word for you is (Nǐ), the plural word is the same in English but it's

你们 (Nǐmen) in Mandarin. And no surprise that

你们的 (Nǐmen de) means "your".

Two Other Oddities

There are two other oddities that I want to mention today.

In English, we just have the word "you". Mandarin has two forms of this. We just saw

(Nǐ)

but there is another, more polite form, that's

(Nín)

It's often used to honor customers, people older than yourself, etc.

The other interesting one is the word for "we". In English, that might or might not include the person you are speaking to. It might include yourself and someone else, or yourself and the person you're speaking to, or perhaps also someone else.

We saw

我们 (Wǒmen)

before as the word for "we". However, if the person you're speaking to is also included, the correct term is

咱们 (Zánmen)

Learning Mandarin

I'll write more soon on the best methods for learning. If you want to get a taste for it in the meantime though, my current favorite site is iTalki, and my favorite teacher by far is Amy He. If you decide to try it, click here and it's cheaper for both you and me.

SQL: What's the right column length for storing email addresses in a SQL Server database?

I spend a lot of time reviewing database designs at customer sites. So often, I come across bizarre decisions that have been taken in the choice of data types within the databases. One problem that I see all the time is the way that email addresses have been stored.

One data element or more?

The first thing you need to decide is whether an email address is really one data element or more.

One school of thought (based on strict normalization) is that the local part (i.e. the part in front of the @ symbol) could be stored separately to the domain part (i.e. the part after the @ symbol). I could see arguments for that, or perhaps even for separating components of the domain part, in some sort of purist view, but I don't do that.

I think of an email address as a single data element.

Data type and length?

I then see variations on whether the email address is stored as varchar or nvarchar. And, I see variations on lengths.

I see people using 60 characters, 100 characters, 150 characters, or whatever they guessed might be big enough. Some even allow 1000 characters. Sadly, I often also see max data types used (i.e. varchar(max)).  Even more sadly, I see the length defined differently in different columns within the database.

Just how long should you allow for email addresses?

There's a standard for that

The correct answer, as it is with many data modelling questions, is to see if there is a standard that defines what's needed. And of course, for email addresses, there is a standard.

RFC 5322 defines email addresses. You'll find it here.

Email addresses are made up of:

  • local part (up to 64 characters)
  • @ sign (1 character)
  • domain part (up to 255 characters)

That means that, in theory, valid email addresses could have been up to 320 characters.

However, as Brian Lawton noted in the comments to this post, there is a further restriction on the overall total. RFC 2821 (here) restricts the total length to 254, as clarified in the errata published here. Interestingly, the errata noted that they had said 255 but then clarified it to 254.

And as for varchar vs nvarchar, the RFC had different allowed characters for each part of the email address but all the allowable characters for email addresses are all contained in the varchar range.

However, since 2012 and another RFC 6530 (here), we've had the ability to use non-ASCII values in both parts. Original email servers would not have allowed values like this:

我喜欢@金屋.香港

But servers updated to handle RFC 6530, 6531, 6532 requirements do permit this.

So the appropriate format today, when using SQL Server, would be nvarchar(254).

Is that what you've allowed for in your database designs? I wish I'd applied the current standard in the design of WideWorldImporters samples for SQL Server 2016. I used nvarchar(256), so I was close.

 

 

Fixing Locking and Blocking Issues in SQL Server – Part 5 – Application Deadlock Handling

This is part 5 in a series of posts:

  • Part 1 covered being sure there is a locking and blocking issue
  • Part 2 covered the impact of RCSI
  • Part 3 looked at the impacts of indexing on locking and blocking
  • Part 4 looked at what deadlocks really are and how SQL Server handles them

Today, though, I want to look at how applications should deal with deadlocks.

Application Deadlock Handling

In the last post, I talked about what deadlocks are (under the covers) and how SQL Server "resolves" them.

In the end, the statement that is killed by the server fails and an error  (1205) gets passed back to the client application.

The problem is that most client applications aren't prepared for the error, and then just blow up themselves. We'll talk more about avoiding deadlocks next time but with a complex application and many concurrent users, you really aren't going to stop them happening.

So your application should be prepared to deal with them.

What's the correct action to take when a deadlock occurs? It's just to retry the transaction that you were trying to apply, after a short delay. Ideally, that delay will also be somewhat random and will be increasing in duration with each retry.

If deadlocks are leaking out to end users when they're running an application, the developers haven't done their jobs.

There are several situations where applications should apply retry logic. Deadlocks are just one issue.

Data Modification Logic

Most applications today though, update databases like this:

  • Start a transaction
  • Send the update
  • Commit the transaction and hope for the best

For the religious readers, perhaps praying should be part of that list as well.

But it's not good enough.

Better logic is something like:

  • While we haven't yet applied the transaction and haven't exhausted our retries
    • Start the transaction
    • Send the update
    • Commit the transaction
  • Trap any errors, and if a failure occurs and it was caused by an error that it's worth retrying, try again

Applications need to plan for failure, and then be happily surprised when the failure doesn't occur.

In the next post, I'll talk about things you can do to avoid deadlocks in the first place.

 

SDU Tools: List Disabled Indexes in SQL Server Databases

I carry out a lot of basic health checks on SQL Server databases. One issue that I come across fairly regularly is that the databases contain indexes that have been disabled.

Most of the time, when I investigate why that's happened, the customer has used some sort of ETL process and has disabled the indexes to improve loading speed. But then something went wrong, and the indexes didn't get rebuilt (i.e. re-enabled).

To make it easy for you to find any disabled indexes in a database, in our free SDU Tools for developers and DBAs, we added a stored procedure ListDisabledIndexes to help. It takes three parameters:

@DatabaseName – the database to check
@SchemaName – either ALL (default) or a comma-delimited list of schemas that should be checked
@TableName – either ALL (default) or a comma-delimited list of tables that should be checked

You can see the output columns in this image:

As well as identifying the indexes, it shows the key and included columns.

You can see the procedure in action in the image above, and in the video here:

To become an SDU Insider and to get our free tools and eBooks, please just visit here:

http://sdutools.sqldownunder.com

 

Data Modeling: a tale of a man with only one name

I've lost count of the number of computer systems that I've worked with over the years, that insist that everyone has at least two names. Yet there are many people in the community who only have a single name. I'm not just talking about people like Cher who might well have a stage name but then have another name on her passport. I'm talking about people who really only have one name.

Can your systems deal with that?

And if they only have one name, which column does it go into?

Need to be PC

In Australia, we have standards for this sort of thing. While many existing systems will use words like Surname, LastName, FirstName, ChristianName, etc. they aren't considered PC (politically correct or appropriate) now either. In our standards, the terms are:

Family Name
Given Names

Only the family name is mandatory. At Meteor (Metadata Online Registry), these are also defined as 40 characters long each. And only the family name is mandatory. That standard says:

Some people do not have a family name and a given name, they have only one name by which they are known. If the person has only one name, record it in the 'Family name' field and leave the 'Given names' field blank.

A Tale of a Disruptive Man with One Name

At a place that I previously worked, we had a guy apply for a job, and he only had one name. He'd legally changed his name to just a single word. He wanted nothing to do with his family any more and removed all reference to them. He was a pretty alternate sort of character but I so wanted him to come and work with us. He would have broken every system in the place.

But in an interesting twist of fate, he couldn't come to his job interview. The airline told us they were not prepared to allow him to fly in what he was wearing.

To this day, I can't imagine what you could wear to an interview but an airline wouldn't let you fly in it.