SQL: Design – Entity Attribute Value Tables (Part 1) – Why?

If you've been working with databases for any length of time, you will have come across implementations of Entity-Attribute-Value (EAV) data models (or non-models as some of my friends would call them).

Instead of storing details of an entity as a standard relational table, rows are stored for each attribute.

For example, let's create a table of people:

When we query it, all is as expected:

And for a long time, this has been how we expect to create tables.

Now, if we changed the previous code to be EAV based, we could do this:

And when we query it, we'd see this:

Clearly this is going to be much harder to work with for all operations apart from reading and writing a list of values by the entity's ID.

So the first question that arises is:

Why would anyone want to have this sort of (non) design in the first place?

There are a few reasons. Some ok, some not ok.

Reason 1: Sparse Data

EAV tables first appeared for situations where there was a large number of potential attributes (ie: columns here) and each entity (ie: row here) only had a few of them. You can see a description of this in Wikipedia here: https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_model

Products like SharePoint had this issue, and that's what pushed the SQL Server team to vastly increase the number of columns per table in SQL Server 2008. Prior to that version, you could have 1024 columns per table, and most people would have considered that if you needed more than that, you were doing something wrong. SQL Server 2008 increased the number of columns per table to 30,000, and also added the concept of a SPARSE column. Unlike standard columns that would always have occupied at least one bit in a NULL bitmap, SPARSE columns only occupy space when they contain data.

To avoid issues with indexes on the sparse columns, we also got filtered indexes in that version. They were highly useful for many reasons, not just for sparse columns.

The addition of SPARSE columns allowed SQL Server to use relational tables to get around the reasons that were the basis for the requirement for EAV tables.

Reason 2: Flexibility?

If we assume that we're not working with a sparse design, what's next? I often hear this type of design referred to as "flexible". When I quiz developers on what they mean by that, it basically boils down to:

I don't have to change the database schema when I add or remove attributes, or even tables.

So they have basically created a table to rule them all.

Image by Andy Omvik

This simply isn't a good reason. It might stem from either working in organizations where someone else controls the database and they're difficult to work with, or don't want to make changes quickly enough. But that's fixing the wrong problem. It can also stem from a basic lack of design in the first place.

Reason 3: Object Repository

We only want to use the database as an object repository. All logic occurs outside the database. We just want to serialize objects in and out of the database. If you have this reason, I love you. You keep me in work, dealing with organizations with intractable performance issues. Fine for a toy application, not fine for an enterprise application.

Worse, this type of design tends to lead to information silos where the data storage is specific to the one version of one application. That's not what enterprises need. They typically have many applications that use the same data, and those applications are built on more than one technology stack.

More to Come

Now there are other reasons and I'll discuss them in the next article on this topic, when we discuss the pros and cons of this type of design, but I wanted to start with the definitions and the basic "why" question.

If you have other "why" options, please let me know in the comments.

6 thoughts on “SQL: Design – Entity Attribute Value Tables (Part 1) – Why?”

  1. I know an ISV that has an EAV table in their database because they let customers and 3rd party apps integrate tightly with it, including adding their own attributes to data. In addition to the columns for E, A, and V, there's basically also a column for AttributeSource. (Values can be the customer's name, the 3rd party integration app, etc.)

    The single table design means any 3rd party app is vaguely compatible with other 3rd party apps, and that customers can manage data that comes from any 3rd party app in the same way.

    I'm not necessarily fond of it, but I understand why they did it that way.

    1. Yes, but I often wonder why they still couldn't just use columns. Why create your own database within the database?

      I do often see it for the "user configurable" data. But even then, it shouldn't apply to most of the database, just for those parts. But I regularly see it for almost the whole DB.

      I hear justifications that "the customers need to be able to add some columns" and somehow that translated to "we need to store all our columns that way".

  2. A (former) SQL Server person whose initials are HB once told me that one of the target use cases for XML columns and indexes was just this type of usage. Want to add an attribute or two but can't get the schema changed? Here's a bucket.

    1. Hi Mark yes can be slthough the cool kids would now use a JSON column. For me, they eork best for very sparse columns eg a use by date for food when 99% of your products aren't food. But even for that might prefer a column. Where I do like JSON columns is sparse array data eg perhaps languages spoken by a staff member.

Leave a Reply

Your email address will not be published. Required fields are marked *