In an earlier post, I discussed the design of EAV (Entity Attribute Value) tables, and looked at why they get used. I'd like to spend a few moments now looking at the pros and cons of these designs.
Let's use the same table as the last time as an example:
The main positive that's typically described is that the schema is "flexible". By this, the developers usually mean "I don't have to change the database schema (or worse, have someone else change it) when my needs change".
I don't buy this argument. Databases do a great job of managing metadata. Why would your app do it any better? I'll be there's another reason (perhaps political) as to why this argument is being put forward in most cases.
Another argument is that this type of design handles sparse data gracefully. When we designed the WideWorldImporters database for Microsoft, we had a few edible products. The vast majority of the products were not edible. So the question is about where a use-by-date for an edible item should be stored. Should every product have that column?
Sparse columns in SQL Server deal with that quite well, but in SQL Server 2016, we also wanted to showcase some of the new JSON-based functionality, so we ended up putting this info into a JSON-based column. I would have been just as happy with a sparse column for this though. AT least that would have allowed us good control over the data type.
I see many problems with these tables.
The most obvious problem is that a large number of joins is required to get all the related data for an entity if you ever need to process this within the database. Many of these applications, though, just read and write the entire entity each time and aren't concerned about this. These applications though tend to be at the small end of the market and aren't too concerned about performance.
A second key issue is data typing. In the example above, what data type is the Value column? It's most likely some sort of string. If you're one of the cool kids, it might be a sql_variant instead. In this case, let's assume it's a string of some type. How can we then provide any sort of guarantees on what's stored in the column? The LoyaltyGrade is clearly an integer in this example, but there's nothing to stop us putting Gold in that field instead.
Ah, but "the app does that" I'm told. That might be true but apps have bugs, as do ETL processes that put data in those fields. That's also very small-system thinking as most organizations will have a large number of applications that want to access this same data, and they won't even be built on the same technology stack, so lots of luck forcing all the access through your app or services layer. Ever tried to connect something like Excel to a service?
What I often then eventually see is typed columns within the table:
and so on. Then we have to consider what it means if a value happens to be in more than one of these. (Ah yes, the app won't allow that).
Once again though, what you're doing is building a database within the database, just to avoid adding or dropping columns.
Yet another problem is the complete lack of referential integrity. You have no way (or at least no ugly way) to ensure that values lie in a particular domain. Is it OK for Brunette to be entered rather than Brown for the hair color? If not, how do you enforce that? (I know, I know, the app does that).
These are just a few of the issues, and there are many more.
Before I finish for today, I do want to mention yet another design that I saw recently. Instead of one table to rule them all, with an EAV-based-non-design, what this database contained was a table for each data type. Yes there was a string table, an int table, a date table, etc.
It should be obvious that such a design is even nastier.
6 thoughts on “SQL: Design – Entity Attribute Value Tables (Part 2) – Pros and Cons”
I call this the "Thing" database, where "Everything's a Thing".
Really bad idea, but I've seen it implemented in various forms. These database rank in my top 5 worst databases I've ever seen ! Especially if you are trying to unravel them from a Data Warehousing perspective.
One real world example that always makes me first think EAV is the survey or request application. A survey will have a growing list of questions which can have various data types for the response. You will have several versions of surveys in the life of the app and questions may or may not be reused. Request systems are very similar in that they typically involve a changing list of questions and answers for different request types (think request for service, or labor, or parts). Have you ever worked on a survey app and seen how they handle the problem? I assume they do something like EAV.
Hi Matt, yep, but doesn't need to be an EAV app. Still works just fine with normal tables.
Hi Greg, but how json type field is going to solve referential integrity problem?
Hi Audrius, my take on it is that if you really want to process data in any way within the DB, you need to either store it all in relational columns, or alternately, you promote certain columns out of the data, to relational columns and then use those.
There are use cases where the EAV design is particular useful.
I have some clients that installed my hybrid EAV design and one of them has a table with now 40+ billion rows (40 000 000 000).
And response time (yes, you can search for EVERYTHING, including wild cards) and response time is less than 500 milliseconds.