SQL: Are big SQL Server databases really slower?

One question that I'm asked all the time when consulting is whether reducing the size of database tables will make queries run faster or not.

The underlying question is typically about whether the client should implement some sort of archiving strategy, to reduce the amount of data in a table by moving older data off into another table.

My answer is that it might help, but if it does, you probably have another issue that would be a better one to solve instead.

When you need to read some data from a table, and you know where the data is, it doesn't matter how much other data is in the table. The only time that the amount of data matters is if you don't know where the data you are looking for is located.

Imagine you have walked into a physical library with a very large number of books. If you look in an index of authors to find the book by your favorite author that you want to read, and it tells you exactly where it is in the library, does it really make a difference if the library is expanded to double the number of books?

Image by J Zamora

You might have to walk a little further, but the process is still much the same, and the time taken will be much the same. But if you didn't have that index of authors, it makes a huge difference, because now you'd have to look through twice as many books to find the one that you need.

Having more data in a table than you need to access is only a problem if you are reading all that data.

The tell-tale signs of this sort of problem are obvious. If you have queries that get slower as the system holds more data, you probably have an issue to solve, and it's likely an indexing problem.

11 thoughts on “SQL: Are big SQL Server databases really slower?”

  1. Having multiple data files and Partitioning are also options to improve performance, but partitioning requires buy in from the application vendor, which is not always an option.

    In Library parlance, having multiple files is like adding more shelves so that you can get to the right area quicker. Partitioning is like having multiple libraries – e.g. data before 1950 in one library and data after 1950 in another just across the road, so that you can quickly get to the collection of data that you are looking for.

    Is there a way to implement data partitioning without involving the vendor? It may be possible by swapping tables for views?

    1. I don't tend to think of partitioning as a performance enhancement. In general, it can have the opposite effect without appropriate use of partition-aligned indexes, etc. I see partitioning as a management enhancement. The one exception is that we often use partitioning to switch in/out data that must be available pretty much 24×7.

  2. Do you think a good scenario to reduce the size would be in the case of slow inserts? Since the table has several indexes to maintain.

      1. @greglow are you saying that a good Clustered or Non-Clustered index is better when the tree is wider than it is deep? Not sure what is spelled and misspelled in your last post.

        1. Sorry, something went very wrong with typing on the phone. I've corrected it. What I'm saying is that the difference in depth between indexes for large and small tables, just isn't that great. Indexes in SQL Server have a high span out. There's a common misconception that they are often deep. That's usually not true.

  3. Archiving is often done not as a performance improvement, but for easier maintenance. The archived databases do not need updating statistics and rebuilding indexes nor CHECDB. Backups are way less frequent too. There may be a few tables that are updated like user log, but archived databases are pretty static. The data in them can be compressed as well to save space. The active databases are then smaller and easier to manage in short maintenance windows (if any). If you are in the cloud, you may be paying for those maintenance tasks, so smaller is less expensive.

    Then even the best indexing won't help if the where close is
    WHERE col LIKE '%'.
    And that code can be a product of user filter plus an ORM robot. Not easy to track down and fix. Complex applications have logic of their own, and people at the back-end will continue being resourceful to keep the lights on.

    1. Archiving and partitioning are about management and maintenance. That's the point. They aren't about performance.

      The attributes you are assigning to archive databases can just as easily be assigned to partitioned tables, and without the need for cross-db queries.

      I see little difference with things like backups. If I have large tables, I can partition them, make the older partitions page compressed, 100% fillfactor, etc. and read-only. I only have to backup read-write filegroups. The overall size of the DB makes little difference if managed appropriately.

      In the case of LIKE '%', the issue is which data needs to be returned. If it's only recent data, then the addition of a recent partition key to the query would fix that, and if it's all the data, querying an active table plus one in an archive db is even more expensive.

      1. By archiving I meant data that is not needed by users anymore, but must be kept for auditing or other legal requirements. (E.g. all student records must be kept for 50 years.) In case that the cross database queries are needed on the regular basis, it is not the right solution.
        LIKE '%' is generated by application and the vendor doesn't even know from where in the application. Bear in mind that most software shops do not program in Java or C#, but in ORM that is written in a programming language by programmers, not developers. The holy grail of ORM "programing" is not to have any code written in SQL, it is a heresy. So in short they have no idea what data need to be returned. If they did, the filter condition would have been there.
        It's just a real world.
        The problem with partitions in OLTP (It is obvious how to use them in DW), is to find the right partition column. OLTP data is queried in so many different ways. If you have a query that does not have a partition column name in the where clause, that query will span partitions and perform worse than if there were no partitions.
        Always have in mind that developers don't see the database, nor they want to know about it. They see "persistence layer", and that's what they work with.

        Disclaimer: I totally agree that for well designed data model and application database size does not matter. However I am in IT business for 25 years and seldom had opportunity to see one.

Leave a Reply

Your email address will not be published. Required fields are marked *