Data Camp Sydney (Free)

Hi Folks, on the 5th June (yes that’s Friday next week), I’m running a Data Camp day for the local Microsoft team. It’s being held at Cliftons in the city.

We’ve got four topics for  the day:

  • Azure SQL DB
  • Azure DocumentDB
  • Azure Machine Learning
  • Azure Stream Analytics

If you want to get your head around any/all of these, we’d love to see you there. Places are limited but you must register and can do so here:

Latest MVA Offerings for SQL Server and for Windows 10 (Yes it’s time to start looking at this)

The team at Microsoft Virtual Academy (MVA) have pushed out some new content that’s relevant to database people.

First, if you’re wondering about using Azure for SQL Server, the Jumpstart for SQL Server in Azure VMs is worth a look. Longer term, I suspect we’ll mostly end up using SQL Server as a platform service (Azure SQL DB) but in the short-term, implementing it in a VM will be more common as it’s probably both easier when migrating existing applications and a little more familiar to most.

Next, if you have to deal with other databases (shock horror, yes there are others including open source ones), there is a course on Open Source Databases on Azure.

Finally, you would have to have been living under a rock not to notice that Windows 10 is coming. But now, it’s time to start to get your head around what’s different. There’s a course that covers off the Fundamentals of the Technical Preview of Windows 10.

More Useful MVA Training Options

I find many of the MVA courses great for quickly getting across concepts or areas that I haven’t worked with before.

This month, the local MVP program has asked me to highlight the following MVA courses. The first two in particular look useful for SQL folk. The third one provides good background:

Azure IaaS Technical Fundamentals

Deep Dive into Networking Storage and Disaster Recovery Scenarios

Embracing Open Source Technologies to Accelerate and Scale Solutions

If you watch them, let me know what you think.

Azure Machine Learning Course–First Melbourne Offering

Hi Folks, we’ve been working hard on a new Azure Machine Learning course.

Come and spend a solid day finding out why Azure Machine Learning should be part of your arsenal.

Our first Melbourne offering of Azure Machine Learning Core Skills is 31st July. I’d love to see you there:

Lots of (Mostly Free) eBooks from Microsoft Virtual Academy

I know that many of my readers appreciate finding out when free eBooks become available.

Microsoft Virtual Academy has been releasing quite a few lately. You’ll find them here:

In particular, SQL Server and database folk might want to take a look at the following:

  • Introducing Microsoft SQL Server 2014
  • Introducing Windows Server 2012 R2
  • Introducing Microsoft Azure HDInsight
  • Microsoft Azure Essentials: Fundamentals of Azure
  • Microsoft Azure Essentials: Fundamentals of Azure
  • Introducing Windows Azure for IT Professionals
  • Rethinking Enterprise Storage: A Hybrid Cloud Model

Clustered columnstore index rebuild script–updated

Recently I published a series of blog posts that provided scripts for rebuilding clustered columnstore indexes.

Niko Neugebauer sent me a suggested update to the latest version of those scripts and it makes sense. The change will help to filter out row groups that are not compressed and Niko noted that as Microsoft have started to include more objects (like Tombstone values)  into sys.column_store_row_groups in Azure SQLDatabase, he believes that will help to prevent future SQL versions having a problem with the scripts.

Sounds good so the latest version is shown below. Thanks Niko !


— Rebuild clustered columnstore indexes when necessary

— Dr Greg Low v2.1



— rebuild when more than supplied % of rows have been deleted

— rebuild when any segments contain more than supplied % deleted rows

— rebuild if more than supplied number of segments are empty

— rebuild when segments are on average less than the supplied % of

–     maximum capacity


DECLARE @DeletedTotalRowPercentage int = 10;  

DECLARE @DeletedSegmentsRowPercentage int = 20;

DECLARE @EmptySegmentsAllowed int = 0;

DECLARE @SegmentPotentialFullnessPercentage int = 90;



DECLARE @IndexesToRebuild TABLE (SchemaName sysname,

                                 TableName sysname,

                                 IndexName sysname);


WITH ClusteredColumnstoreIndexes


( SELECT t.object_id AS ObjectID,

         SCHEMA_NAME(t.schema_id) AS SchemaName,
 AS TableName,
 AS IndexName

  FROM sys.indexes AS i

  INNER JOIN sys.tables AS t

  ON i.object_id = t.object_id

  WHERE i.type = 5




( SELECT csrg.object_id AS ObjectID,

         csrg.total_rows AS TotalRows,

         csrg.deleted_rows AS DeletedRows,

         csrg.deleted_rows * 100.0 / csrg.total_rows AS DeletedPercentage,

         CASE WHEN csrg.total_rows = csrg.deleted_rows

              THEN 1 ELSE 0

         END AS IsEmptySegment

  FROM sys.column_store_row_groups AS csrg

  WHERE csrg.state = 3 — Compressed (Ignoring: 0 – Hidden, 1 – Open, 2 – Closed, 4 – Tombstone)




( SELECT cci.ObjectID,




         SUM(CAST(rg.TotalRows AS decimal(18,0))) AS TotalRows,

         SUM(CAST(rg.DeletedRows AS decimal(18,0))) AS DeletedRows,

         SUM(CAST(rg.DeletedRows AS decimal(18,0))) * 100.0

           / SUM(CAST(rg.TotalRows AS decimal(18,0)))

           AS DeletedPercentage,

         SUM(rg.IsEmptySegment) aS EmptySegments,

         COUNT(rg.TotalRows) AS TotalSegments

  FROM ClusteredColumnstoreIndexes AS cci

  INNER JOIN RowGroups AS rg

  ON cci.ObjectID = rg.ObjectID

  GROUP BY cci.ObjectID, cci.SchemaName, cci.TableName, cci.IndexName


INSERT @IndexesToRebuild (SchemaName, TableName, IndexName)

SELECT s.SchemaName, s.TableName, s.IndexName

FROM IndexStats AS s

WHERE s.DeletedPercentage > @DeletedTotalRowPercentage

OR s.EmptySegments > @EmptySegmentsAllowed


                   WHERE rg.ObjectID = s.ObjectID

                   AND rg.DeletedPercentage > @DeletedSegmentsRowPercentage)

OR (s.TotalRows * 100 / (s.TotalSegments * 1048576.0))

            < @SegmentPotentialFullnessPercentage;


DECLARE @SchemaName sysname;

DECLARE @TableName sysname;

DECLARE @IndexName sysname;

DECLARE @SqlCommand nvarchar(max);




  SELECT SchemaName, TableName, IndexName

  FROM @IndexesToRebuild

  ORDER BY SchemaName, TableName, IndexName;


OPEN IndexList;


FETCH NEXT FROM IndexList INTO @SchemaName, @TableName, @IndexName;




  SET @SqlCommand = N'ALTER INDEX ' + QUOTENAME(@IndexName)

                  + N' ON ' + QUOTENAME(@SchemaName)

                  + N'.' + QUOTENAME(@TableName)

                  + N' REBUILD PARTITION = ALL;';

  PRINT @SqlCommand;


  FETCH NEXT FROM IndexList INTO @SchemaName, @TableName, @IndexName;



CLOSE IndexList;



Demos must work–simple concept? So why is it so rarely applied?

When I attend events like TechEd, like many people I usually find the networking time more valuable than the session time. There is a pretty tight limit on the number of sessions you can attend, no matter how hard you try. So I often watch the sessions later when I can. At one of the earliest TechEd Australia events that I attended, they gave us CDs of the TechEd USA sessions. That was great because that TechEd had around 250 sessions, and there were over 150 that I would have loved to attend. Clearly that wasn’t possible.

I was doing a lot of driving to/from client sites at that time, and so I had a lot of available listening time. I dragged the audio out of all the TechEd session videos and just listened to the audio. If a session really interested me, I’d go back and watch the video and demos.

That year, I listened to/watched around 150 sessions. While it was intense and interesting, there was something that I wasn’t expecting, that completely stunned me. I heard the presenters apologising for demo failures in close to 100 of those sessions. I found that really, really hard to believe. I was determined that when I was presenting any sort of session at events like these, that I wasn’t going to be one of those people, or I’d do my very best to avoid it.

Having your session work as planned already puts you in the top third of all the sessions.

So if you want to be on the right side of this equation, what can you do?

1. Have realistic goals for the amount of content.

I normally aim to tell people three things in a session. They certainly won’t remember more than that and it’s the stories that they’ll remember anyway, so make sure that each demo has a good story associated with it. And if showing any one of these three things takes more than about 15 to 20 minutes, try again. Blaise Pascal said “I would have written a shorter letter, but I did not have the time”. Plan the content and tell the story succinctly. Plan your timing. It’s hard work to get just the right message in just the right amount of time.

I have lost count of how many sessions I’ve been to that run out of time or that failed to make one of the key points. Don’t be one of them.

You’ll be especially sorry if your session description includes content that you didn’t end up covering. Someone might have come just for that content.

2. Aim for repeatable achievable outcomes.

I’ve seen so many demos that would probably only ever work with the moon in the correct position and the presenter holding his/her head the right way. Don’t do this.

3. Have a clear structure.

There’s perceived wisdom that sessions should be all demos. I don’t buy it as the only rule. I’ve been to brilliant sessions with none, and I’ve been to horrid sessions from amazing people where they have no structure to what they are trying to show.

4. Practice both the session and the demos.

And I mean multiple times. The bigger the event, the more the whole session needs to be second nature. Try to deliver the session at smaller venues first. Local user groups, virtual sessions, etc. are good options for this.

5. Find another presenter as a critical friend.

I have friends that are talented presenters and I love having one of them in the room for trial runs, with a view of being critical. Someone that says “yeah that was great” is nice. Someone that says “you lost me in the second part of the demo” or “I think the third demo would work better if you…” is what you need. Be prepared to do the same for them.

6. Record the demos.

When presenting at large events, I have a series of screen shots saved on a USB key, and I also have a full video walkthrough of each of the demos. I’m determined that the audience will get to see the demos.

As a simple example of when this has saved me, two years ago for TechEd Australia, I was presenting some Azure-related sessions. The Azure folk had decided to do maintenance and take things offline right in the middle of the event.

I told the audience, switched across to the videos of each demo, which I did a live voice-over for, and I suspect that many of them would have quickly forgotten that they were watching a video. By comparison, I attended several other Azure-related sessions at that same event and watched presenter after presenter stumbling when things would not work. You need a fall back plan.

Hint: Don’t just play the video with voice, etc. as well though – make it still pretty much a live thing. I’ve seen sessions where people just play a video with sound and it often looks like they could never have actually done the demo, particularly if it’s someone else’s voice.

7. Don’t try to debug live.

Unless it’s an obvious and trivial issue, you will do far more damage by trying to debug it. Attendees hate watching you stuff around trying to fix issues. You might feel great *if* you ever get it solved but you will have messed up your timing and possibly looked really, really bad in the meantime. And if you can’t solve it, you will have really messed up. Instead, move on and use your backup plan.

Isn’t this what everyone does?

It seems pretty basic to do these things but time and again, I see the opposite happening even at major events. I watched AzureConf the other day, and even the keynote had these issues. Having been involved in event keynotes and knowing what level of rehearsal normally goes into them, I can just imagine the discussions that went on later. They wouldn’t have been pretty.

You can avoid being one of these statistics with just a bit of planning, and you’ll already be ahead of the pack.

Should there be code differences between Azure SQL Database editions?

I spend a lot of time working with software houses, helping them to make their applications work well with SQL Server. One thing that I’ve heard loud and clear over the years is that most software houses won’t write a single line of code that will only run on the enterprise edition of SQL Server, because they are not prepared to limit their potential pool of customers to those running enterprise edition.

This is completely at odds with the discussions that I’ve had with the SQL Server marketing team members who think that having feature differences will cause people to purchase enterprise edition instead. I’m sure that’s true for customers who write their own applications in-house and is also why at promotional events, the customers that you see mentioned are often those types of customers. However, most SQL Server customers run 3rd party applications written by other companies. The customers will often ask the software houses what software is required to run their applications and they then purchase what they need, unless they have some other pre-existing form of relationship with Microsoft.

So this means that having a difference in features can actually cost Microsoft money as the customers will often purchase standard edition because that’s all the software that they will be running requires.

Worse, when software houses are comparing SQL Server to other database engines, they compare SQL Server standard edition to the other engines, not the enterprise edition. This makes SQL Server compare badly for marketing reasons instead of technical reasons. For example, I saw a software house the other day comparing SQL Server with PostgreSQL. Their contention was that PostgreSQL (a free database engine) had a good high availability story and that SQL Server did not. Their logic was that SQL Server only had mirroring (and log shipping) and Microsoft had announced the deprecation of mirroring. So their contention was that SQL Server did not have a good availability story. The fact that enterprise edition had a really good story was irrelevant as they don’t consider anything in that version.

A further issue appears with coding. There is no developer edition of SQL Server that is limited to standard edition features. Software houses want to write code once and have it work across all target editions.

Another core issue is that this focus on enterprise edition has removed the upgrade reasons for standard edition customers. I think that every edition should have a compelling upgrade story, for every version. As an example, in SQL Server 2014, the reasons to upgrade for standard edition customers are the ability to use 128GB of memory and to have backup encryption. I’ll leave it to the reader to decide if that’s a strong story. I don’t think it is.

The final issue with the existing situation is that the product is moving into areas that need support from software houses. SQL Server 2014 introduced a range of in-memory options. For any customer that can’t change the code (ie: most customers), this is irrelevant. Again you’ll see the same large customers who write their own apps being mentioned in the launch events. I this case, I think the marketing team really have made a mistake. While new HA features, etc. can be retrofitted by a DBA to an existing database, the new in-memory options really need to be architected into the design of the applications. And that’s where it’s a real problem that it’s in enterprise edition only. The software houses are unlikely to use it, and yet they are exactly the same people that we need to embrace it.

So what does this have to do with Azure?

Bob Beauchemin wrote a great blog post today about how Azure SQL Database is moving to a SQL Server 2014 code base. That’s a great thing but one aspect that caught my eye was the mention that this is the first version of Azure SQL Database where features like columnstore indexes, etc. will only appear in the premium editions of Azure SQL Database.

While I’ve had concerns about how the licensing has been handled in the on-premises versions of SQL Server, in Azure SQL Database this concerns me even more. I really think that Azure SQL Database should offer the same code surface no matter which edition you are using. It makes sense to have performance and availability (including HA) options differ between Basic, Standard, and Premium but I really don’t like the idea of coding/feature differences. First up, it will again see software houses ignoring useful features. But worse, in the Azure SQL Database arena, customers are much more likely to use a mix of database editions than they currently do on-premises.

For example, if I am offering an application as a service, I want to be able to have different databases for different tenant customers. I really want to be able to choose the performance, reliability, availability options, etc. on a tenant by tenant basis, not across all tenants that are using my application. Having coding differences across the editions would make this a mess, or at least I think so.

I’d love to hear your thoughts.