Suggest Compression Strategies for Tables and Indexes

I make no secret of the fact that I think that table compression was the best thing added to SQL Server 2008.

It seems like every week that I'm talking to clients about it. They have Enterprise Edition but they weren't using it because they were familiar with O/S level disk compression and they think that compression will mean smaller and slower, yet this often couldn't be further from the truth for customers whose databases are currently I/O bound or who don't have enough memory in their buffer cache for it to work correctly.

It's important though, to make sure the correct form of compression is used for each table and index. In fact, the decision should be taken for each partition of each table and index if the table is partitioned. We often recommend different compression strategies for different partitions of the same table.

To make it easier to decide which form of compression might suit, I've created another version of the script that I use as a first pass in determining compression strategies. The earlier version suggested ROW and/or PAGE compression. While this version can do the same, it also considers the clustered columnstore indexes that are part of the Enterprise Edition of SQL Server 2014. (I've included a switch to turn that function on or off in case you don't have 2014 yet).

The cutoff points are configurable but my default values are as shown.

  • CCI (Clustered Columnstore Index) will be recommended where the partition is scanned more than 95% of the time, updated less than 10% of the time, seeks and lookups are less than 5% of the time, and where there are at least 800,000 rows. It will also only be recommended if it is supported.
  • PAGE will be recommended where the partition is scanned more than 75% of the time and updated less than 20% of the time.
  • ROW will be recommended in all other cases. We believe that ROW should be the default in SQL Server across the board, instead of NONE.

It is important that this script only be run after the system has been in use for long enough to have experienced typical usage patterns.

I hope this helps you to get into using compression where it makes sense for you.

—————————————————————————–

— Suggest data compression changes for tables and indexes

— Dr Greg Low

— March 2015

—————————————————————————–

DECLARE @ClusteredColumnstoreScansCutoff int = 95;

DECLARE @ClusteredColumnstoreUpdatesCutoff int = 10;

DECLARE @ClusteredColumnstoreSeeksLookupsCutoff int = 5;

DECLARE @ClusteredColumnstoreTotalRowsCutoff bigint = 800000;

 

DECLARE @PageCompressionScansCutoff int = 75;

DECLARE @PageCompressionUpdatesCutoff int = 20;

 

DECLARE @IsClusteredColumnstoreSupported bit = 1;

—————————————————————————–

 

WITH IndexUsageStats

AS

( SELECT object_id AS ObjectID,

         index_id AS IndexID,

         COALESCE(user_seeks, 0) + COALESCE(system_seeks, 0) AS Seeks,

         COALESCE(user_scans, 0) + COALESCE(system_scans, 0) AS Scans,

         COALESCE(user_lookups, 0) + COALESCE(system_lookups, 0) AS Lookups,

         COALESCE(user_updates, 0) + COALESCE(system_updates, 0) AS Updates,

         COALESCE(user_seeks, 0) + COALESCE(system_seeks, 0)

           + COALESCE(user_scans, 0) + COALESCE(system_scans, 0)

           + COALESCE(user_lookups, 0) + COALESCE(system_lookups, 0)

           + COALESCE(user_updates, 0) + COALESCE(system_updates, 0) AS Operations

  FROM sys.dm_db_index_usage_stats

  WHERE database_id = DB_ID()

),

PartitionUsageDetails

AS

( SELECT SCHEMA_NAME(t.schema_id) AS SchemaName,

         t.name AS TableName,

         i.name AS IndexName,

         i.index_id AS IndexID,

         i.type_desc AS IndexType,

         CASE WHEN COALESCE(Operations, 0) <> 0

              THEN CAST((COALESCE(Seeks, 0) + COALESCE(Lookups, 0)) * 100.0

                        / COALESCE(Operations, 0) AS int)

              ELSE 0

         END AS SeekLookupPercentage,

         CASE WHEN COALESCE(Operations, 0) <> 0

              THEN CAST(COALESCE(Scans, 0) * 100.0 / COALESCE(Operations, 0) AS int)

              ELSE 0

         END AS ScanPercentage,

         CASE WHEN COALeSCE(Operations, 0) <> 0

              THEN CAST(COALESCE(Updates, 0) * 100.0 / COALESCE(Operations, 0) AS int)

              ELSE 0

         END AS UpdatePercentage,

         p.partition_number AS PartitionNumber,

         p.data_compression_desc AS CurrentCompression,

         p.rows AS TotalRows

   FROM sys.tables AS t

   INNER JOIN sys.indexes AS i

   ON t.object_id = i.object_id

   INNER JOIN sys.partitions AS p

   ON i.object_id = p.object_id

   AND i.index_id = p.index_id

   LEFT OUTER JOIN IndexUsageStats AS ius

   ON i.object_id = ius.ObjectID

   AND i.index_id = ius.IndexID

   WHERE i.index_id > 0

   AND t.is_ms_shipped = 0

   AND t.type = N'U'

),

SuggestedPartitionCompressionTypes

AS

( SELECT pud.*,

         CASE WHEN pud.ScanPercentage >= @ClusteredColumnstoreScansCutoff

              AND pud.UpdatePercentage <= @ClusteredColumnstoreUpdatesCutoff

              AND pud.SeekLookupPercentage <= @ClusteredColumnstoreSeeksLookupsCutoff

              AND pud.TotalRows >= @ClusteredColumnstoreTotalRowsCutoff

              AND @IsClusteredColumnstoreSupported <> 0

              THEN N'CCI'

              WHEN pud.ScanPercentage >= @PageCompressionScansCutoff

              AND pud.UpdatePercentage <= @PageCompressionUpdatesCutoff

              THEN N'PAGE'

              ELSE N'ROW'

         END AS SuggestedCompression

  FROM PartitionUsageDetails AS pud

)

SELECT spct.SchemaName,

       spct.TableName,

       spct.IndexName,

       spct.PartitionNumber,

       spct.CurrentCompression,

       spct.SuggestedCompression

FROM SuggestedPartitionCompressionTypes AS spct

WHERE spct.SuggestedCompression <> spct.CurrentCompression

ORDER BY spct.SchemaName,

         spct.TableName,

         CASE WHEN spct.IndexID = 1 THEN 0 ELSE 1 END,

         spct.IndexName;

 

Rebuild Partitioned Clustered Columnstore Indexes

A short time back, I provided an updated script that can be used to rebuild clustered columnstore indexes when necessary.

One thing that wasn't covered by that script was the ability to only individual index partitions. Below, I've provided an updated version of the script that will rebuild only affected partitions if the clustered columnstore index is partitioned.

———————————————————————-

— Rebuild clustered columnstore indexes when necessary

— Dr Greg Low v2.2

 

———————————————————————-

— rebuild when more than supplied % of rows have been deleted

— rebuild when any segments contain more than supplied % deleted rows

— rebuild if more than supplied number of segments are empty

— rebuild when segments are on average less than the supplied % of

–     maximum capacity

 

DECLARE @DeletedTotalRowPercentage int = 10; 

DECLARE @DeletedSegmentsRowPercentage int = 20;

DECLARE @EmptySegmentsAllowed int = 0;

DECLARE @SegmentPotentialFullnessPercentage int = 90;

———————————————————————-

 

DECLARE @IndexesToRebuild TABLE (SchemaName sysname,

                                 TableName sysname,

                                 IndexName sysname,

                                 PartitionNumber int,

                                 IsPartitioned bit);

 

SET NOCOUNT ON;

 

WITH ClusteredColumnstoreIndexes

AS

( SELECT t.object_id AS ObjectID,

         SCHEMA_NAME(t.schema_id) AS SchemaName,

         t.name AS TableName,

         i.name AS IndexName

  FROM sys.indexes AS i

  INNER JOIN sys.tables AS t

  ON i.object_id = t.object_id

  WHERE i.type = 5

),

RowGroups

AS

( SELECT csrg.object_id AS ObjectID,

         csrg.partition_number AS PartitionNumber,

         csrg.total_rows AS TotalRows,

         csrg.deleted_rows AS DeletedRows,

         csrg.deleted_rows * 100.0 / csrg.total_rows AS DeletedPercentage,

         CASE WHEN csrg.total_rows = csrg.deleted_rows

              THEN 1 ELSE 0

         END AS IsEmptySegment

  FROM sys.column_store_row_groups AS csrg

  WHERE csrg.state = 3 — Compressed (Ignoring: 0 – Hidden, 1 – Open, 2 – Closed, 4 – Tombstone)

),

IndexPartitionStats

AS

( SELECT cci.ObjectID,

         cci.SchemaName,

         cci.TableName,

         cci.IndexName,

         rg.PartitionNumber,

         SUM(CAST(rg.TotalRows AS decimal(18,0))) AS TotalRows,

         SUM(CAST(rg.DeletedRows AS decimal(18,0))) AS DeletedRows,

         SUM(CAST(rg.DeletedRows AS decimal(18,0))) * 100.0

           / SUM(CAST(rg.TotalRows AS decimal(18,0)))

           AS DeletedPercentage,

         SUM(rg.IsEmptySegment) aS EmptySegments,

         COUNT(rg.TotalRows) AS TotalSegments

  FROM ClusteredColumnstoreIndexes AS cci

  INNER JOIN RowGroups AS rg

  ON cci.ObjectID = rg.ObjectID

  GROUP BY cci.ObjectID, cci.SchemaName, cci.TableName, cci.IndexName, rg.PartitionNumber

)

INSERT @IndexesToRebuild (SchemaName, TableName, IndexName, PartitionNumber, IsPartitioned)

SELECT s.SchemaName,

       s.TableName,

       s.IndexName,

       s.PartitionNumber,

       CASE WHEN EXISTS(SELECT 1

                        FROM IndexPartitionStats AS ps

                        WHERE s.SchemaName = ps.SchemaName

                        AND s.TableName = ps.TableName

                        AND s.IndexName = ps.IndexName

                        AND ps.PartitionNumber > 1)

            THEN 1

            ELSE 0

       END

FROM IndexPartitionStats AS s

WHERE s.DeletedPercentage > @DeletedTotalRowPercentage

OR s.EmptySegments > @EmptySegmentsAllowed

OR EXISTS(SELECT 1 FROM RowGroups AS rg

                   WHERE rg.ObjectID = s.ObjectID

                   AND rg.DeletedPercentage > @DeletedSegmentsRowPercentage)

OR (s.TotalRows * 100 / (s.TotalSegments * 1048576.0))

            < @SegmentPotentialFullnessPercentage;

 

DECLARE @SchemaName sysname;

DECLARE @TableName sysname;

DECLARE @IndexName sysname;

DECLARE @PartitionNumber int;

DECLARE @IsPartitioned bit;

DECLARE @SqlCommand nvarchar(max);

 

DECLARE IndexList CURSOR FAST_FORWARD READ_ONLY

FOR

  SELECT SchemaName, TableName, IndexName, PartitionNumber, IsPartitioned

  FROM @IndexesToRebuild

  ORDER BY SchemaName, TableName, IndexName, PartitionNumber;

 

OPEN IndexList;

 

FETCH NEXT FROM IndexList INTO @SchemaName, @TableName, @IndexName, @PartitionNumber, @IsPartitioned;

 

WHILE @@FETCH_STATUS = 0

BEGIN

  SET @SqlCommand = N'ALTER INDEX ' + QUOTENAME(@IndexName)

                  + N' ON ' + QUOTENAME(@SchemaName)

                  + N'.' + QUOTENAME(@TableName)

                  + N' REBUILD'

                  + CASE WHEN @IsPartitioned <> 0

                         THEN N' PARTITION = ' + CAST(@PartitionNumber AS nvarchar(10))

                         ELSE N"

                    END

                  + ';';

  PRINT @SqlCommand;

  EXEC(@SqlCommand);

  FETCH NEXT FROM IndexList INTO @SchemaName, @TableName, @IndexName, @PartitionNumber, @IsPartitioned;

END;

 

CLOSE IndexList;

DEALLOCATE IndexList;

 

 

 

 

Lots of (Mostly Free) eBooks from Microsoft Virtual Academy

I know that many of my readers appreciate finding out when free eBooks become available.

Microsoft Virtual Academy has been releasing quite a few lately. You'll find them here: http://www.microsoftvirtualacademy.com/eBooks?PRID=MVP-AU-7914

In particular, SQL Server and database folk might want to take a look at the following:

  • Introducing Microsoft SQL Server 2014
  • Introducing Windows Server 2012 R2
  • Introducing Microsoft Azure HDInsight
  • Microsoft Azure Essentials: Fundamentals of Azure
  • Microsoft Azure Essentials: Fundamentals of Azure
  • Introducing Windows Azure for IT Professionals
  • Rethinking Enterprise Storage: A Hybrid Cloud Model

Interested in Winning a Trip to the Ignite Conference?

This is the first year for Microsoft's new Ignite conference. Many people are keen to go but figure it would break the bank.

image

Here's another option:

The Microsoft Virtual Academy have their Heroes program running again. You get one point in the draw for just signing up. Even better though, take some courses and both learn something worthwhile and increase your chances of winning the competition:

image

Sign up and enter here now: SIGN UP           (NOTE: Australians Only for this one sadly)

Update on Options for Learning Chinese: eChineseLearning

I seem to have quite a crowd of people that asks me how I'm going with learning Chinese whenever they see me. I promised that I would put together some posts on what has/hasn't worked for me.

Over the last few years I had spent quite a bit of time trying to get my Mandarin fluency up. I had finished the 5 levels of Rosetta Stone which helped a lot with vocab but I really don't like what they've done with their online "Rosetta Studio" options. They aren't good value any more and you don't get enough practice to be worthwhile. What I wanted to do was to get a better vocab and to gain a lot more practice in conversation, as that's where the real learning happens.

I've tried a wide variety of options. My current top recommendation is:

eChineseLearning – No idea where I first came across them (probably an online search) but this has been wonderful. This is providing me with a structured 1 on 1 conversational learning environment. Each week, I spend 3 separate sessions of just under an hour, one on one with a teacher from Wuhan (north of Beijing). They have a booking site where available timeslots are shown, both for my primary teacher (Amy) and for alternates if I need to have a class at a time that Amy isn't available. I was quite wary about signing up for a service like this but I'm so happy I did. Amy is both interesting and interested in my learning, and provides a good structured approach to work towards proficiency.

I think this is the quickest way to a good level of proficiency. Each lesson, we spend some time in general conversation, then spend a while working on some new words, then spend some time on grammatical issues, and finally spend some time with listening practice. I like the way it is organised and I like the flexibility of the booking. Most other sites I looked at needed you to commit to the same time each week. With my schedule, that wouldn't work. eChineseLearning allows me to pick times in each week that will work for me. So if I'm travelling, or have family engagements, I can just work around it. You can change it up to 24 hours before each session.

I could imagine this site being good for anyone that wants to start learning the language (ie: beginner), or who is keen to become more proficient.

Like all these types of sites, they love referrals. They'd love you to tell them that you heard of them from me. If you do that, both you and I get free extra time. But that's up to you.

Either way, if you've wondered about learning another language like Mandarin or you have already tried and stalled, you might want to get a feel for it. They have a free trial lesson. You can sign up here: http://www.echineselearning.com/free-trial/index.html 

Highly recommended!

Clustered columnstore index rebuild script–updated

Recently I published a series of blog posts that provided scripts for rebuilding clustered columnstore indexes.

Niko Neugebauer sent me a suggested update to the latest version of those scripts and it makes sense. The change will help to filter out row groups that are not compressed and Niko noted that as Microsoft have started to include more objects (like Tombstone values)  into sys.column_store_row_groups in Azure SQLDatabase, he believes that will help to prevent future SQL versions having a problem with the scripts.

Sounds good so the latest version is shown below. Thanks Niko !

———————————————————————-

— Rebuild clustered columnstore indexes when necessary

— Dr Greg Low v2.1

 

———————————————————————-

— rebuild when more than supplied % of rows have been deleted

— rebuild when any segments contain more than supplied % deleted rows

— rebuild if more than supplied number of segments are empty

— rebuild when segments are on average less than the supplied % of

–     maximum capacity

 

DECLARE @DeletedTotalRowPercentage int = 10;  

DECLARE @DeletedSegmentsRowPercentage int = 20;

DECLARE @EmptySegmentsAllowed int = 0;

DECLARE @SegmentPotentialFullnessPercentage int = 90;

———————————————————————-

 

DECLARE @IndexesToRebuild TABLE (SchemaName sysname,

                                 TableName sysname,

                                 IndexName sysname);

 

WITH ClusteredColumnstoreIndexes

AS

( SELECT t.object_id AS ObjectID,

         SCHEMA_NAME(t.schema_id) AS SchemaName,

         t.name AS TableName,

         i.name AS IndexName

  FROM sys.indexes AS i

  INNER JOIN sys.tables AS t

  ON i.object_id = t.object_id

  WHERE i.type = 5

),

RowGroups

AS

( SELECT csrg.object_id AS ObjectID,

         csrg.total_rows AS TotalRows,

         csrg.deleted_rows AS DeletedRows,

         csrg.deleted_rows * 100.0 / csrg.total_rows AS DeletedPercentage,

         CASE WHEN csrg.total_rows = csrg.deleted_rows

              THEN 1 ELSE 0

         END AS IsEmptySegment

  FROM sys.column_store_row_groups AS csrg

  WHERE csrg.state = 3 — Compressed (Ignoring: 0 – Hidden, 1 – Open, 2 – Closed, 4 – Tombstone)

),

IndexStats

AS

( SELECT cci.ObjectID,

         cci.SchemaName,

         cci.TableName,

         cci.IndexName,

         SUM(CAST(rg.TotalRows AS decimal(18,0))) AS TotalRows,

         SUM(CAST(rg.DeletedRows AS decimal(18,0))) AS DeletedRows,

         SUM(CAST(rg.DeletedRows AS decimal(18,0))) * 100.0

           / SUM(CAST(rg.TotalRows AS decimal(18,0)))

           AS DeletedPercentage,

         SUM(rg.IsEmptySegment) aS EmptySegments,

         COUNT(rg.TotalRows) AS TotalSegments

  FROM ClusteredColumnstoreIndexes AS cci

  INNER JOIN RowGroups AS rg

  ON cci.ObjectID = rg.ObjectID

  GROUP BY cci.ObjectID, cci.SchemaName, cci.TableName, cci.IndexName

)

INSERT @IndexesToRebuild (SchemaName, TableName, IndexName)

SELECT s.SchemaName, s.TableName, s.IndexName

FROM IndexStats AS s

WHERE s.DeletedPercentage > @DeletedTotalRowPercentage

OR s.EmptySegments > @EmptySegmentsAllowed

OR EXISTS(SELECT 1 FROM RowGroups AS rg

                   WHERE rg.ObjectID = s.ObjectID

                   AND rg.DeletedPercentage > @DeletedSegmentsRowPercentage)

OR (s.TotalRows * 100 / (s.TotalSegments * 1048576.0))

            < @SegmentPotentialFullnessPercentage;

 

DECLARE @SchemaName sysname;

DECLARE @TableName sysname;

DECLARE @IndexName sysname;

DECLARE @SqlCommand nvarchar(max);

 

DECLARE IndexList CURSOR FAST_FORWARD READ_ONLY

FOR

  SELECT SchemaName, TableName, IndexName

  FROM @IndexesToRebuild

  ORDER BY SchemaName, TableName, IndexName;

 

OPEN IndexList;

 

FETCH NEXT FROM IndexList INTO @SchemaName, @TableName, @IndexName;

 

WHILE @@FETCH_STATUS = 0

BEGIN

  SET @SqlCommand = N'ALTER INDEX ' + QUOTENAME(@IndexName)

                  + N' ON ' + QUOTENAME(@SchemaName)

                  + N'.' + QUOTENAME(@TableName)

                  + N' REBUILD PARTITION = ALL;';

  PRINT @SqlCommand;

  EXEC(@SqlCommand);

  FETCH NEXT FROM IndexList INTO @SchemaName, @TableName, @IndexName;

END;

 

CLOSE IndexList;

DEALLOCATE IndexList;

 

PASS BA Analytics Conference for 2015

I'm always telling people that the thing I love about BI projects is that they tend to appeal to the people who pay the bills. This is a good place to be doing project work.

When I look at large companies like Amazon, I can imagine that the people who do the IT work that relates to order processing, invoicing, shipping, etc. do a great job. But I'll bet that their life is full of budget cutbacks, headcounts, increased pressure to improve productivity, etc. By comparison, I'll bet the projects that fund parts of their site that make recommendations for possible purchases, etc. are funded at an entirely different level. And that's because these projects can more directly impact the profitability of the organization, and do so in a more visible way.

When you stop and look at your career, you might want to consider whether you should be trying to move up the value stack within your organization.

A great place to get ideas on this would be to attend the PASS BA Analytics Conference. This relatively new conference has a very different focus to the "standard" PASS summit. If analytics interests you at all, this would be the place to be in April:

And if you provide the registration code UKBA05, you'll get a $150 discount as well.

Improved clustered columnstore index rebuild–potential segment fullness

Earlier, I posted a maintenance script that could be used with clustered columnstore indexes to try to determine when they should be rebuilt. Reorganize is not a very useful operation for these indexes as it basically just forces a close of the existing delta store, ready for compression.

One of the problems with a clustered columnstore index is that it doesn't work as well if the segments have been created at less than the maximum potential size of 1048576 rows per segment. Over use of reorganize options could cause this, but more likely it will be to do with loading the index with small batches of data, rather than loading large batches in BULK INSERT operations.

I've updated the maintenance script so that you can now also choose to rebuild if your segments are much less full than their optimal size ie: you could have 1,048,560,000 rows of data in 10,000 segments. So if you say 90 percent as the cutoff and your segments hold less than 90 percent of their potential, that will also trigger a rebuild.

Hope this helps someone:

———————————————————————-

— Rebuild clustered columnstore indexes when necessary

— Dr Greg Low v2.0

 

———————————————————————-

— rebuild when more than supplied % of rows have been deleted

— rebuild when any segments contain more than supplied % deleted rows

— rebuild if more than supplied number of segments are empty

— rebuild when segments are on average less than the supplied % of

–     maximum capacity

 

DECLARE @DeletedTotalRowPercentage int = 10;   

DECLARE @DeletedSegmentsRowPercentage int = 20;

DECLARE @EmptySegmentsAllowed int = 0;

DECLARE @SegmentPotentialFullnessPercentage int = 90;

———————————————————————-

 

DECLARE @IndexesToRebuild TABLE (SchemaName sysname,

                                 TableName sysname,

                                 IndexName sysname);

 

WITH ClusteredColumnstoreIndexes

AS

( SELECT t.object_id AS ObjectID,

         SCHEMA_NAME(t.schema_id) AS SchemaName,

         t.name AS TableName,

         i.name AS IndexName

  FROM sys.indexes AS i

  INNER JOIN sys.tables AS t

  ON i.object_id = t.object_id

  WHERE i.type = 5

),

RowGroups

AS

( SELECT csrg.object_id AS ObjectID,

         csrg.total_rows AS TotalRows,

         csrg.deleted_rows AS DeletedRows,

         csrg.deleted_rows * 100.0 / csrg.total_rows AS DeletedPercentage,

         CASE WHEN csrg.total_rows = csrg.deleted_rows

              THEN 1 ELSE 0

         END AS IsEmptySegment

  FROM sys.column_store_row_groups AS csrg

),

IndexStats

AS

( SELECT cci.ObjectID,

         cci.SchemaName,

         cci.TableName,

         cci.IndexName,

         SUM(CAST(rg.TotalRows AS decimal(18,0))) AS TotalRows,

         SUM(CAST(rg.DeletedRows AS decimal(18,0))) AS DeletedRows,

         SUM(CAST(rg.DeletedRows AS decimal(18,0))) * 100.0

           / SUM(CAST(rg.TotalRows AS decimal(18,0)))
           AS DeletedPercentage,

         SUM(rg.IsEmptySegment) aS EmptySegments,

         COUNT(rg.TotalRows) AS TotalSegments

  FROM ClusteredColumnstoreIndexes AS cci

  INNER JOIN RowGroups AS rg

  ON cci.ObjectID = rg.ObjectID

  GROUP BY cci.ObjectID, cci.SchemaName, cci.TableName, cci.IndexName

)

INSERT @IndexesToRebuild (SchemaName, TableName, IndexName)

SELECT s.SchemaName, s.TableName, s.IndexName

FROM IndexStats AS s

WHERE s.DeletedPercentage > @DeletedTotalRowPercentage

OR s.EmptySegments > @EmptySegmentsAllowed

OR EXISTS(SELECT 1 FROM RowGroups AS rg

                   WHERE rg.ObjectID = s.ObjectID

                   AND rg.DeletedPercentage > @DeletedSegmentsRowPercentage)

OR (s.TotalRows * 100 / (s.TotalSegments * 1048576.0))
            < @SegmentPotentialFullnessPercentage;

 

DECLARE @SchemaName sysname;

DECLARE @TableName sysname;

DECLARE @IndexName sysname;

DECLARE @SqlCommand nvarchar(max);

 

DECLARE IndexList CURSOR FAST_FORWARD READ_ONLY

FOR

  SELECT SchemaName, TableName, IndexName

  FROM @IndexesToRebuild

  ORDER BY SchemaName, TableName, IndexName;

 

OPEN IndexList;

 

FETCH NEXT FROM IndexList INTO @SchemaName, @TableName, @IndexName;

 

WHILE @@FETCH_STATUS = 0

BEGIN

  SET @SqlCommand = N'ALTER INDEX ' + QUOTENAME(@IndexName)

                  + N' ON ' + QUOTENAME(@SchemaName)

                  + N'.' + QUOTENAME(@TableName)

                  + N' REBUILD PARTITION = ALL;';

  PRINT @SqlCommand;

  EXEC(@SqlCommand); 

  FETCH NEXT FROM IndexList INTO @SchemaName, @TableName, @IndexName;

END;

 

CLOSE IndexList;

DEALLOCATE IndexList;

  

 

Demos must work–simple concept? So why is it so rarely applied?

When I attend events like TechEd, like many people I usually find the networking time more valuable than the session time. There is a pretty tight limit on the number of sessions you can attend, no matter how hard you try. So I often watch the sessions later when I can. At one of the earliest TechEd Australia events that I attended, they gave us CDs of the TechEd USA sessions. That was great because that TechEd had around 250 sessions, and there were over 150 that I would have loved to attend. Clearly that wasn't possible.

I was doing a lot of driving to/from client sites at that time, and so I had a lot of available listening time. I dragged the audio out of all the TechEd session videos and just listened to the audio. If a session really interested me, I'd go back and watch the video and demos.

That year, I listened to/watched around 150 sessions. While it was intense and interesting, there was something that I wasn't expecting, that completely stunned me. I heard the presenters apologising for demo failures in close to 100 of those sessions. I found that really, really hard to believe. I was determined that when I was presenting any sort of session at events like these, that I wasn't going to be one of those people, or I'd do my very best to avoid it.

Having your session work as planned already puts you in the top third of all the sessions.

So if you want to be on the right side of this equation, what can you do?

1. Have realistic goals for the amount of content.

I normally aim to tell people three things in a session. They certainly won't remember more than that and it's the stories that they'll remember anyway, so make sure that each demo has a good story associated with it. And if showing any one of these three things takes more than about 15 to 20 minutes, try again. Blaise Pascal said "I would have written a shorter letter, but I did not have the time". Plan the content and tell the story succinctly. Plan your timing. It's hard work to get just the right message in just the right amount of time.

I have lost count of how many sessions I've been to that run out of time or that failed to make one of the key points. Don't be one of them.

You'll be especially sorry if your session description includes content that you didn't end up covering. Someone might have come just for that content.

2. Aim for repeatable achievable outcomes.

I've seen so many demos that would probably only ever work with the moon in the correct position and the presenter holding his/her head the right way. Don't do this.

3. Have a clear structure.

There's perceived wisdom that sessions should be all demos. I don't buy it as the only rule. I've been to brilliant sessions with none, and I've been to horrid sessions from amazing people where they have no structure to what they are trying to show.

4. Practice both the session and the demos.

And I mean multiple times. The bigger the event, the more the whole session needs to be second nature. Try to deliver the session at smaller venues first. Local user groups, virtual sessions, etc. are good options for this.

5. Find another presenter as a critical friend.

I have friends that are talented presenters and I love having one of them in the room for trial runs, with a view of being critical. Someone that says "yeah that was great" is nice. Someone that says "you lost me in the second part of the demo" or "I think the third demo would work better if you…" is what you need. Be prepared to do the same for them.

6. Record the demos.

When presenting at large events, I have a series of screen shots saved on a USB key, and I also have a full video walkthrough of each of the demos. I'm determined that the audience will get to see the demos.

As a simple example of when this has saved me, two years ago for TechEd Australia, I was presenting some Azure-related sessions. The Azure folk had decided to do maintenance and take things offline right in the middle of the event.

I told the audience, switched across to the videos of each demo, which I did a live voice-over for, and I suspect that many of them would have quickly forgotten that they were watching a video. By comparison, I attended several other Azure-related sessions at that same event and watched presenter after presenter stumbling when things would not work. You need a fall back plan.

Hint: Don't just play the video with voice, etc. as well though – make it still pretty much a live thing. I've seen sessions where people just play a video with sound and it often looks like they could never have actually done the demo, particularly if it's someone else's voice.

7. Don't try to debug live.

Unless it's an obvious and trivial issue, you will do far more damage by trying to debug it. Attendees hate watching you stuff around trying to fix issues. You might feel great *if* you ever get it solved but you will have messed up your timing and possibly looked really, really bad in the meantime. And if you can't solve it, you will have really messed up. Instead, move on and use your backup plan.

Isn't this what everyone does?

It seems pretty basic to do these things but time and again, I see the opposite happening even at major events. I watched AzureConf the other day, and even the keynote had these issues. Having been involved in event keynotes and knowing what level of rehearsal normally goes into them, I can just imagine the discussions that went on later. They wouldn't have been pretty.

You can avoid being one of these statistics with just a bit of planning, and you'll already be ahead of the pack.

Rebuild clustered columnstore indexes when they require maintenance

For general index maintenance, our friend Old Hallengren has an awesome solution for most people: https://ola.hallengren.com/sql-server-index-and-statistics-maintenance.html

We've started to have customers using clustered columnstore indexes in SQL Server 2014, and they need to be treated differently. I checked Ola's latest scripts today to see what happens with columnstore indexes. It appears that the code ignores nonclustered columnstore indexes (ie: index type of 6), which makes sense as we need to rebuild them whenever the data changes, and in the meantime, the table is read-only. So that makes lots of sense.

For other indexes (including clustered columnstore indexes which are updatable), the code uses the same avg_fragmentation_in_percent values from sys.dm_db_index_physical_stats. Unfortunately, that view doesn't return any data for columnstore indexes of either type, so that's not actually helpful.

Niko has an awesome series of blog posts about columnstore indexes that I encourage you to read: http://www.nikoport.com/columnstore/ In part 36, he discusses what he thinks is needed for maintenance. The metrics that I want to use overlap the ones that he's suggested in that article.

With a clustered columnstore index, when a row is deleted, a flag in a bitmap is updated to indicate the deletion. The row is still present in the columnstore row group. When an update occurs, the same mechanism flags the row as deleted, then inserts the new version into the delta store.

The main problems that will occur are when many rows have been deleted.

In the code below, I've provided an implementation that will rebuild clustered columnstore indexes in a database when one of three criteria is met:

  • The overall percentage of deleted rows has exceeded a cutoff (I've suggested a 10% default)
  • The percentage of rows in any individual segment exceeds a cutoff (I've suggested a 20% default)
  • The number of segments that are completely empty (all rows have been deleted) has exceeded a cutoff (I've suggested none)

This would work for both COLUMNSTORE and COLUMNSTORE_ARCHIVE compression.

Hope it helps someone until Ola does his magic:

———————————————————————-

— Rebuild clustered columnstore indexes when necessary

— Dr Greg Low v1.0

 

———————————————————————-

— rebuild when more than supplied % of rows have been deleted

— rebuild when any segments contain more than supplied % deleted rows

— rebuild if more than supplied number of segments are empty

 

DECLARE @DeletedTotalRowPercentage int = 10;   

DECLARE @DeletedSegmentsRowPercentage int = 20;

DECLARE @EmptySegmentsAllowed int = 0;         

———————————————————————-

 

DECLARE @IndexesToRebuild TABLE (SchemaName sysname,

                                 TableName sysname,

                                 IndexName sysname);

 

WITH ClusteredColumnstoreIndexes

AS

( SELECT t.object_id AS ObjectID,

         SCHEMA_NAME(t.schema_id) AS SchemaName,

         t.name AS TableName,

         i.name AS IndexName

  FROM sys.indexes AS i

  INNER JOIN sys.tables AS t

  ON i.object_id = t.object_id

  WHERE i.type = 5

),

RowGroups

AS

( SELECT csrg.object_id AS ObjectID,

         csrg.total_rows AS TotalRows,

         csrg.deleted_rows AS DeletedRows,

         csrg.deleted_rows * 100.0 / csrg.total_rows AS DeletedPercentage,

         CASE WHEN csrg.total_rows = csrg.deleted_rows

              THEN 1 ELSE 0

         END AS IsEmptySegment

  FROM sys.column_store_row_groups AS csrg

),

IndexStats

AS

( SELECT cci.ObjectID,

         cci.SchemaName,

         cci.TableName,

         cci.IndexName,

         SUM(rg.TotalRows) AS TotalRows,

         SUM(rg.DeletedRows) AS DeletedRows,

         SUM(rg.DeletedRows) * 100.0 / SUM(rg.TotalRows) AS DeletedPercentage,

         SUM(rg.IsEmptySegment) aS EmptySegments

  FROM ClusteredColumnstoreIndexes AS cci

  INNER JOIN RowGroups AS rg

  ON cci.ObjectID = rg.ObjectID

  GROUP BY cci.ObjectID, cci.SchemaName, cci.TableName, cci.IndexName

)

INSERT @IndexesToRebuild (SchemaName, TableName, IndexName)

SELECT s.SchemaName, s.TableName, s.IndexName

FROM IndexStats AS s

WHERE s.DeletedPercentage > @DeletedTotalRowPercentage

OR s.EmptySegments > @EmptySegmentsAllowed

OR EXISTS(SELECT 1 FROM RowGroups AS rg

                   WHERE rg.ObjectID = s.ObjectID

                   AND rg.DeletedPercentage > @DeletedSegmentsRowPercentage);

 

DECLARE @SchemaName sysname;

DECLARE @TableName sysname;

DECLARE @IndexName sysname;

DECLARE @SqlCommand nvarchar(max);

 

DECLARE IndexList CURSOR FAST_FORWARD READ_ONLY

FOR

  SELECT SchemaName, TableName, IndexName

  FROM @IndexesToRebuild

  ORDER BY SchemaName, TableName, IndexName;

 

OPEN IndexList;

 

FETCH NEXT FROM IndexList INTO @SchemaName, @TableName, @IndexName;

 

WHILE @@FETCH_STATUS = 0

BEGIN

  SET @SqlCommand = N'ALTER INDEX ' + QUOTENAME(@IndexName)

                  + N' ON ' + QUOTENAME(@SchemaName)

                  + N'.' + QUOTENAME(@TableName)

                  + N' REBUILD PARTITION = ALL;';

  PRINT @SqlCommand;

  EXEC(@SqlCommand); 

  FETCH NEXT FROM IndexList INTO @SchemaName, @TableName, @IndexName;

END;

 

CLOSE IndexList;

DEALLOCATE IndexList;