Suggest Compression Strategies for Tables and Indexes

I make no secret of the fact that I think that table compression was the best thing added to SQL Server 2008.

It seems like every week that I’m talking to clients about it. They have Enterprise Edition but they weren’t using it because they were familiar with O/S level disk compression and they think that compression will mean smaller and slower, yet this often couldn’t be further from the truth for customers whose databases are currently I/O bound or who don’t have enough memory in their buffer cache for it to work correctly.

It’s important though, to make sure the correct form of compression is used for each table and index. In fact, the decision should be taken for each partition of each table and index if the table is partitioned. We often recommend different compression strategies for different partitions of the same table.

To make it easier to decide which form of compression might suit, I’ve created another version of the script that I use as a first pass in determining compression strategies. The earlier version suggested ROW and/or PAGE compression. While this version can do the same, it also considers the clustered columnstore indexes that are part of the Enterprise Edition of SQL Server 2014. (I’ve included a switch to turn that function on or off in case you don’t have 2014 yet).

The cutoff points are configurable but my default values are as shown.

  • CCI (Clustered Columnstore Index) will be recommended where the partition is scanned more than 95% of the time, updated less than 10% of the time, seeks and lookups are less than 5% of the time, and where there are at least 800,000 rows. It will also only be recommended if it is supported.
  • PAGE will be recommended where the partition is scanned more than 75% of the time and updated less than 20% of the time.
  • ROW will be recommended in all other cases. We believe that ROW should be the default in SQL Server across the board, instead of NONE.

It is important that this script only be run after the system has been in use for long enough to have experienced typical usage patterns.

I hope this helps you to get into using compression where it makes sense for you.

—————————————————————————–

— Suggest data compression changes for tables and indexes

— Dr Greg Low

— March 2015

—————————————————————————–

DECLARE @ClusteredColumnstoreScansCutoff int = 95;

DECLARE @ClusteredColumnstoreUpdatesCutoff int = 10;

DECLARE @ClusteredColumnstoreSeeksLookupsCutoff int = 5;

DECLARE @ClusteredColumnstoreTotalRowsCutoff bigint = 800000;

 

DECLARE @PageCompressionScansCutoff int = 75;

DECLARE @PageCompressionUpdatesCutoff int = 20;

 

DECLARE @IsClusteredColumnstoreSupported bit = 1;

—————————————————————————–

 

WITH IndexUsageStats

AS

( SELECT object_id AS ObjectID,

         index_id AS IndexID,

         COALESCE(user_seeks, 0) + COALESCE(system_seeks, 0) AS Seeks,

         COALESCE(user_scans, 0) + COALESCE(system_scans, 0) AS Scans,

         COALESCE(user_lookups, 0) + COALESCE(system_lookups, 0) AS Lookups,

         COALESCE(user_updates, 0) + COALESCE(system_updates, 0) AS Updates,

         COALESCE(user_seeks, 0) + COALESCE(system_seeks, 0)

           + COALESCE(user_scans, 0) + COALESCE(system_scans, 0)

           + COALESCE(user_lookups, 0) + COALESCE(system_lookups, 0)

           + COALESCE(user_updates, 0) + COALESCE(system_updates, 0) AS Operations

  FROM sys.dm_db_index_usage_stats

  WHERE database_id = DB_ID()

),

PartitionUsageDetails

AS

( SELECT SCHEMA_NAME(t.schema_id) AS SchemaName,

         t.name AS TableName,

         i.name AS IndexName,

         i.index_id AS IndexID,

         i.type_desc AS IndexType,

         CASE WHEN COALESCE(Operations, 0) <> 0

              THEN CAST((COALESCE(Seeks, 0) + COALESCE(Lookups, 0)) * 100.0

                        / COALESCE(Operations, 0) AS int)

              ELSE 0

         END AS SeekLookupPercentage,

         CASE WHEN COALESCE(Operations, 0) <> 0

              THEN CAST(COALESCE(Scans, 0) * 100.0 / COALESCE(Operations, 0) AS int)

              ELSE 0

         END AS ScanPercentage,

         CASE WHEN COALeSCE(Operations, 0) <> 0

              THEN CAST(COALESCE(Updates, 0) * 100.0 / COALESCE(Operations, 0) AS int)

              ELSE 0

         END AS UpdatePercentage,

         p.partition_number AS PartitionNumber,

         p.data_compression_desc AS CurrentCompression,

         p.rows AS TotalRows

   FROM sys.tables AS t

   INNER JOIN sys.indexes AS i

   ON t.object_id = i.object_id

   INNER JOIN sys.partitions AS p

   ON i.object_id = p.object_id

   AND i.index_id = p.index_id

   LEFT OUTER JOIN IndexUsageStats AS ius

   ON i.object_id = ius.ObjectID

   AND i.index_id = ius.IndexID

   WHERE i.index_id > 0

   AND t.is_ms_shipped = 0

   AND t.type = N'U'

),

SuggestedPartitionCompressionTypes

AS

( SELECT pud.*,

         CASE WHEN pud.ScanPercentage >= @ClusteredColumnstoreScansCutoff

              AND pud.UpdatePercentage <= @ClusteredColumnstoreUpdatesCutoff

              AND pud.SeekLookupPercentage <= @ClusteredColumnstoreSeeksLookupsCutoff

              AND pud.TotalRows >= @ClusteredColumnstoreTotalRowsCutoff

              AND @IsClusteredColumnstoreSupported <> 0

              THEN N'CCI'

              WHEN pud.ScanPercentage >= @PageCompressionScansCutoff

              AND pud.UpdatePercentage <= @PageCompressionUpdatesCutoff

              THEN N'PAGE'

              ELSE N'ROW'

         END AS SuggestedCompression

  FROM PartitionUsageDetails AS pud

)

SELECT spct.SchemaName,

       spct.TableName,

       spct.IndexName,

       spct.PartitionNumber,

       spct.CurrentCompression,

       spct.SuggestedCompression

FROM SuggestedPartitionCompressionTypes AS spct

WHERE spct.SuggestedCompression <> spct.CurrentCompression

ORDER BY spct.SchemaName,

         spct.TableName,

         CASE WHEN spct.IndexID = 1 THEN 0 ELSE 1 END,

         spct.IndexName;

 

6 thoughts on “Suggest Compression Strategies for Tables and Indexes”

  1. Recently we implemented TDE on all our client databases.  With encryption compression is no longer effective and the backups are roughly equivalent in size to the used data pages in the database.  Backups have increased dramatically in both size and duration.  To reduce those effects I implemented page compression for any heap or index over 500mb.  For the most part the queries are reports are large updates; there is little in the way of typical OLTP queries.
    Do you see big downsides in this?
    And do you think there is value in compressing small indexes?

  2. Greg – great post!
    Can these recommendations be used for both DW and OLTP workloads? Would you recommend row defaults for DW?

  3. Very nice, Greg! Thanks for sharing.
    -Kev
    P.S. What's your opinion of sp_estimate_data_compression_savings?

  4. Hi Don,
    Important to be clear that TDE would mess with backup compression but no issues with table compression, which is what this post is discussing.
    For the table compression, you might want to run the script to see what is recommended, rather than a blanket rule on page for anything over 500MB.

  5. Hi Jay,
    Yes, same applies. The issue is not the generic use of the DB (such as DW or OLTP); it's how each table and index is used. For different usage, different recommendations would be generated.

  6. Hi Kevin,
    It works pretty well. It seems to create a new version of the table in msdb based on a table sample. Then it compresses it to see what happened. The accuracy totally depends upon how representative that sample was. For tables that are fairly consistent in nature, the estimate is pretty good.
    One thing to note is that I recall a bug when it was first released. My recollection is that the RTM version of 2008 had a bug where that proc would fail for tables that didn't have a clustered primary key. That was quickly fixed.

Leave a Reply

Your email address will not be published.