There is an endless discussion in the development community about whether databases should include declared foreign key constraints or not.
As a consultant, I regularly visit a wide variety of client sites. Many of these have applications designed without constraints. When I ask why there are no constraints, the response is invariably one of the following:
- The app ensures that it’s right
- They don’t work well with our application development
- They are too slow
- What’s a foreign key constraint?
Most of the sites that I work with have sizeable databases. This discussion is important in those situations.
The app ensures that it’s right
In most enterprises, most large databases end up being used by more than one single application, and often each application will be made up of many sub-applications or modules. That means that every place that accesses the data needs to apply the same rules. (The same issue applies to other forms of constraints as well).
Worse, the databases often end up being accessed by applications from different authors, and often with different technology stacks.
ETL processes are often used to move data into the databases or to update that data.
Even worse, in real production scenarios, data-fixes often get applied directly to the database.
A view of the world that says that it will be ok because everything goes through a single application is a very narrow view of the world.
They don’t work well with our application development
What this really boils down to in most cases is that the developers didn’t want to work out the order for updates and it’s just easier for them without constraints as they can update any table they want, in any order.
That means that the data is periodically in invalid states and (hopefully) eventually consistent. But it also means that the data has many interim states that are actually invalid states. What happens with concurrent access at that point? What happens to a reporting application that finds invoices for customers that don’t exist?
It would be helpful if SQL Server supported deferred constraint checking but today it doesn’t. I’ve been formally asking them for it for over 10 years: https://connect.microsoft.com/SQLServer/feedback/details/124728/option-to-defer-foreign-key-constraint-checking-until-transaction-commit
I still think it’s one of the most important enhancements that could be made to SQL Server from a development point of view.
They are too slow
So often I’m told “we can’t do it because it would be too slow”. Yet almost every time I ask if they’ve actually tested it, I’m told “no”.
The reality is usually that someone’s brother’s cousin’s friend read it somewhere on the Internet so they decided it would be a problem.
Whenever I really test it, I find very little impact as long as appropriate indexing is in place, and sensible options are chosen when bulk importing data.
I do occasionally find specific constraints where I decide to disable them but they are few and far between. And even then, I like to see the constraint still be in place (so it can be discovered by tools, etc. ) but just not checked. It certainly never applies to all constraints in a database.
What’s a foreign key constraint?
Sadly this is also a common question, fortunately mostly only on smaller databases and applications. Some developers really just don’t understand the issue but this is not most developers.
When I do a detailed check on a system that has run without constraints for quite a while, I almost always find data integrity issues.
When I show the integrity issues to the customer, they look surprised then say something like “ah, that’s right, we had that bug a few weeks back”. Pretty much every time, there are some issues somewhere.
That’s ok if you’re building a toy application; not ok if you’re building a large financial application.
At the very least, sites that decide not to have constraints should have a process in place that periodically checks data integrity. Most sites that don’t have constraints don’t do this either.
It’s important to find issues as soon as they occur. Finding issues well after they have been created always makes them much harder to fix. At least with constraints, you get instant feedback that something’s wrong. If you have a bug in your ETL process that is messing up your data in some subtle way, you don’t want to find that out several weeks later after other processing has occurred on that data.
I like to see constraints in place until it’s proven that a particular constraint can’t be in place for a performance reason. Then, I like to see it (and only it) disabled rather than removed. A bonus is that it can usually still be discovered by tools.