Should my database have foreign key constraints?

There is an endless discussion in the development community about whether databases should include declared foreign key constraints or not.

As a consultant, I regularly visit a wide variety of client sites. Many of these have applications designed without constraints. When I ask why there are no constraints, the response is invariably one of the following:

  • The app ensures that it’s right
  • They don’t work well with our application development
  • They are too slow
  • What’s a foreign key constraint?

Most of the sites that I work with have sizeable databases. This discussion is important in those situations.

The app ensures that it’s right

In most enterprises, most large databases end up being used by more than one single application, and often each application will be made up of many sub-applications or modules. That means that every place that accesses the data needs to apply the same rules. (The same issue applies to other forms of constraints as well).

Worse, the databases often end up being accessed by applications from different authors, and often with different technology stacks.

ETL processes are often used to move data into the databases or to update that data.

Even worse, in real production scenarios, data-fixes often get applied directly to the database.

A view of the world that says that it will be ok because everything goes through a single application is a very narrow view of the world.

They don’t work well with our application development

What this really boils down to in most cases is that the developers didn’t want to work out the order for updates and it’s just easier for them without constraints as they can update any table they want, in any order.

That means that the data is periodically in invalid states and (hopefully) eventually consistent. But it also means that the data has many interim states that are actually invalid states. What happens with concurrent access at that point? What happens to a reporting application that finds invoices for customers that don’t exist?

It would be helpful if SQL Server supported deferred constraint checking but today it doesn’t. I’ve been formally asking them for it for over 10 years: https://connect.microsoft.com/SQLServer/feedback/details/124728/option-to-defer-foreign-key-constraint-checking-until-transaction-commit

I still think it’s one of the most important enhancements that could be made to SQL Server from a development point of view.

They are too slow

So often I’m told “we can’t do it because it would be too slow”. Yet almost every time I ask if they’ve actually tested it, I’m told “no”.

The reality is usually that someone’s brother’s cousin’s friend read it somewhere on the Internet so they decided it would be a problem.

Whenever I really test it, I find very little impact as long as appropriate indexing is in place, and sensible options are chosen when bulk importing data.

I do occasionally find specific constraints where I decide to disable them but they are few and far between. And even then, I like to see the constraint still be in place (so it can be discovered by tools, etc. ) but just not checked. It certainly never applies to all constraints in a database.

What’s a foreign key constraint?

Sadly this is also a common question, fortunately mostly only on smaller databases and applications. Some developers really just don’t understand the issue but this is not most developers.

Bottom Line

When I do a detailed check on a system that has run without constraints for quite a while, I almost always find data integrity issues.

When I show the integrity issues to the customer, they look surprised then say something like “ah, that’s right, we had that bug a few weeks back”. Pretty much every time, there are some issues somewhere.

That’s ok if you’re building a toy application; not ok if you’re building a large financial application.

At the very least, sites that decide not to have constraints should have a process in place that periodically checks data integrity. Most sites that don’t have constraints don’t do this either.

It’s important to find issues as soon as they occur. Finding issues well after they have been created always makes them much harder to fix. At least with constraints, you get instant feedback that something’s wrong. If you have a bug in your ETL process that is messing up your data in some subtle way, you don’t want to find that out several weeks later after other processing has occurred on that data.

Summary

I like to see constraints in place until it’s proven that a particular constraint can’t be in place for a performance reason. Then, I like to see it (and only it) disabled rather than removed. A bonus is that it can usually still be discovered by tools.

Data Tales #8: Database on a Diet (Part 3)

Hi Folks,

My series of articles for SQL Server Magazine continues. Last time, I continued a short series about a large database that needed to go on a diet. Last time, I look at the internals of row and page compression, to see what happens when they are used. We saw the significant differences in how ROW and PAGE compression are implemented. So how do you decide what to use?

This time, we look at when ROW and PAGE compression make sense, and provide detailed guidance on how to decide which should be used for which tables and indexes, or even for which partitions of which tables and indexes. A blended approach is usually the appropriate outcome.

http://sqlmag.com/sql-server/data-tales-8-case-database-diet-part-3

Enjoy!

Avoiding drives going offline when adding nodes to availability groups or creating / validating Windows clusters

Several of my customers lately have mentioned that when they’ve added a node to a Windows failover cluster for supporting an Availability Group, that they’ve ended up with an outage because the non-shared drives have been taken offline by the installation process. All commented that it didn’t use to happen on Windows Server 2008 R2 and they were taken by surprise when it happened on Windows Server 2012 R2.

I mentioned it in the MVP list and one of my US buddies Allan Hirt said that we need to uncheck the relevant checkbox during the install.

I stepped through another one today and this is the culprit:

image

The default is now to add all eligible storage to the cluster.

That is perhaps a reasonable default when setting up most Windows failover clusters, but it is really not a sensible default when working with Availability Groups. It ends up adding all your other drives and then puts them offline.

Unchecking that box while performing installs/validations should avoid the issue.

Thanks to Allan for pointing it out.

SQL Down Under Podcast 68–Guest Joe Yong–SQL Server 2016 StretchDB

Hi Folks,

One of the intriguing options that’s coming as part of SQL Server 2016 is StretchDB. I’ve been spending some time working with it and learning about it, and I’ve ended up with lots of questions.

It was great to get an opportunity to speak to Joe Yong to get all my questions sorted.

Hope you’ll enjoy it too. You’ll find it here: http://sqldownunder.azurewebsites.net/Podcasts