SQL Server 2008 R2: StreamInsight – User-defined aggregates

I'd briefly played around with user-defined aggregates in StreamInsight with CTP3 but when I started working with the new Count Windows, I found I had to have one working. I learned a few things along the way that I hope will help someone.

The first thing you have to do is define a class:

public class IntegerAverage: CepAggregate<int, int>

{

    public override int GenerateOutput(IEnumerable<int> eventData)

    {

        if (eventData.Count() == 0)

        {

            return 0;

        }

        else

        {

          return eventData.Sum() / eventData.Count();

        }

    }

}

In this case, I've defined an IntegerAverage class that inherits from CepAggregate. CepAggregate is then declared as a generic type taking an int input parameter and providing an int output. The GenerateOutput method is overriden and called when it's time to process the input. In this case, I've just checked if there are any values, returned zero if not and returned an integer average if there are values.

This is fine but to make it work with Intellisense and make the compiler play nicely, etc. you need to do more. The following code helps:

public static class UDAExtensionMethods

{

    [CepUserDefinedAggregate(typeof(IntegerAverage))]

    public static int IntAvg<T>(this CepWindow<T> window,

                                Expression<Func<T, int>> map)

    {

        throw CepUtility.DoNotCall();

    }

}

We define the name that we will use in code (in this case IntAvg). As this should never be called directly, we throw an exception if it is ever called.

The part that tricked me at first was this:

Expression<Func<T, int>> map

I had seen it used in the doco in Books Online but BOL didn't ever mention where it came from. To use this, you need to add a using statement in your code as:

using System.Linq.Expressions;

This is the line that maps the function signature. Without it, you'll get an error telling you that your aggregate has no overload taking one parameter. This allows me to then call it in code via:

select new { AverageMilliseconds = w.IntAvg(e =>

                         e.MillisecondsToPassSpeedCheckPoint) };

Passing The Event Instead

An alternative to this would be to build an aggregate that's specific to the the event type. You could then call it via:

select new { AverageMilliseconds = w.SomeAvg() };

but then your user-defined aggregate would have to be defined via:

public class SomeAverage: CepAggregate<TollPointEvent, int> 

In this case, TollPointEvent was the name of the event type that the window is being created over. You'd need to then access members of the event class inside your aggregate code, like:

public override int GenerateOutput(IEnumerable<TollPointEvent> events)

{

  if (events.Count() == 0)

  {

    return 0;

  }

  else

  {

    return events.Sum(e => e.MillisecondsToPassSpeedCheckPoint)

         / events.Count();

  }

}

Finally, you'd need to modify your declaration similar to:

public static class UDAExtensionMethods

{

  [CepUserDefinedAggregate(typeof(IntegerAverage))]

  public static int SomeAvg<T>(this CepWindow<T> window)

  {

      throw CepUtility.DoNotCall();

  }

}

Note that the Expression section would not be needed as the whole event was being passed.

While you can do this, I'd only look to do this where your aggregate is very specific to the event class and where it involved multiple fields from within the event.

 

SQL Server 2008 R2: StreamInsight changes at RTM: Count Windows

Another interesting change in the RTM version of StreamInsight is the addition of a new window type. Count Windows aren't time based but are based on counting a number of events. The window type provided in this release is called CountByStartTimeWindow. Based on that name, you'd have to presume we might get other types of count windows in the future.

This new window type takes two parameters. The first is the number of events. The second is an output policy, similar to the policies now required for the previous window types. The CountWindowOutputPolicy currently only offers one policy type as PointAlignToWindowEnd.

Here's an example of the window in use:

var queryOutput = from w in input.CountByStartTimeWindow

                    (10,

                     CountWindowOutputPolicy.PointAlignToWindowEnd)

                  select new { AverageMilliseconds = w.IntAvg(e =>

                              e.MillisecondsToPassSpeedCheckPoint) };

There are a few important things to note here. The first is that the count window doesn't work with the built-in aggregates such as Sum, Min, Max, Avg, etc. This surprised me and I'm sure that will be an option some time in the future. At present, it works with user-defined aggregates. (I'll describe building this user-defined aggregate in another post).

The second note is that the count isn't strictly a count of events, it's a count of distinct event start times. So if you have only one event per start time, the number of events in the window will match your count parameter. But, if you have multiple events starting at the same time, you can have more events in your window than your parameter has requested.

 

SQL Server 2008 R2: StreamInsight changes at RTM: Access to grouping keys via explicit typing

One of the problems that existed in the CTP3 edition of StreamInsight was an error that occurred if you tried to access the grouping key from within your projection expression. That was a real issue as you always need access to the key. It's a bit like using a GROUP BY in TSQL and then not including the columns you're grouping by in the SELECT clause. You'd see the results but not be able to know which results are which. Look at the following code:

var laneSpeeds = from e in vehicleSpeeds

                 group e by e.Lane

                 into lanes

                 from eventWindow in lanes.TumblingWindow

                      (TimeSpan.FromSeconds(10),

                       HoppingWindowOutputPolicy.ClipToWindowEnd)

                 select new

                 {

                    Lane = lanes.Key.ToString(),

                    AverageSpeed = eventWindow.Avg(e => e.Speed),

                    MaxSpeed = eventWindow.Max(e => e.Speed)

                 };

Executing this code would return:

System.IndexOutOfRangeException: Index was outside the bounds of the array.

This was a known issue in CTP3 that I hoped would have been corrected at RTM. I spoke with the development team and they pointed out that the error only occurs if you're using an anonymous type. This means that if I declare the type using code like:

struct LaneGroup

{

  public string Lane;

  public double AverageSpeed;

  public double MaxSpeed;

}

my code will then work as long as I modify it to specify the type in the select clause:

var laneSpeeds = from e in vehicleSpeeds

                 group e by e.Lane

                 into lanes

                 from eventWindow in lanes.TumblingWindow

                      (TimeSpan.FromSeconds(10),

                       HoppingWindowOutputPolicy.ClipToWindowEnd)

                 select new LaneGroup

                 {

                    Lane = lanes.Key.ToString(),

                    AverageSpeed = eventWindow.Avg(e => e.Speed),

                    MaxSpeed = eventWindow.Max(e => e.Speed)

                 };

 

This should be a suitable workaround in most cases. I gather the original problem with be fixed in a later refresh of StreamInsight.

 

SQL Server 2008 R2: StreamInsight changes at RTM: HoppingWindow, TumblingWindow, SnapshotWindow

We've been working on updating our demos and samples for the RTM changes of StreamInsight. I'll detail these as I come across them.

The first is that there is a change to the HoppingWindow. The first two parameters are the same in the constructor but the third parameter is now required. It is the HoppingWindowOutputPolicy. Currently, there is only a single option for this which is ClipToWindowEnd.

A similar change happened to the TumblingWindow. Curiously, it also takes a HoppingWindowOutputPolicy. I suppose that makes sense though as it is really just a special case of a HoppingWindow.

SnapshotWindow also now has a required parameter but in this case it's a SnapshotWindowOutputPolicy. The option for this policy is just called Clip.

Here are some examples of creating HoppingWindow, TumblingWindow and SnapshotWindow.

var queryOutput = from w in input.HoppingWindow

                         (TimeSpan.FromMinutes(1),

                          TimeSpan.FromSeconds(10),

                          HoppingWindowOutputPolicy.ClipToWindowEnd)

                  select new { VehiclesPerMinute = w.Count() };

 

var queryOutput = from w in input.TumblingWindow

                     (TimeSpan.FromSeconds(10),

                      HoppingWindowOutputPolicy.ClipToWindowEnd)

                  select new { VehiclesPer10Seconds = w.Count() };

 

var queryOutput = from w in input.SnapshotWindow

                            (SnapshotWindowOutputPolicy.Clip)

                  select new { VehiclesPerWhat = w.Count() };

 

Security-related database settings are not restored when a DB is restored

A question came up today about whether it was a bug that the TRUSTWORTHY database setting isn't restored to its previous value when a database is restored.

TRUSTWORTHY is a very powerful setting for a database. By design, it's not restored when a database is. We actually documented this behavior when writing the Upgrade Technical Reference for 2008: http://www.microsoft.com/downloads/en/confirmation.aspx?familyId=66d3e6f5-6902-4fdd-af75-9975aea5bea7&displayLang=en

The other settings that are not restored with a database (for similar reasons) are ENABLE_BROKER and HONOR_BROKER_PRIORITY. After a restore or upgrade of a database, you need to check these. (Note: HONOR_BROKER_PRIORITY was introduced in 2008 so it won't apply to upgrades from 2005 but ENABLE_BROKER does).

Book: Confessions of a Public Speaker: Scott Berkun

It's probably apparent that I've been travelling again a lot lately as the number of posts related to books has gone up.

One book that I picked up along the way and really enjoyed was Scott Berkun's Confessions of a Public Speaker. I could relate to so much of what Scott was talking about and there are quite a few solid nuggets of advice in the book.

It's very important when you are regularly giving technical presentations to spend time learning about the "presenting" part of the task, not just about the "technical" aspects. I found it quite insightful when Scott discussed how giving technical presentations has so much in common with stand-up comedy. It's not that you need to be a stand-up comedian but much can be learned by watching how good stand-up comedians ply their trade. They endlessly deliver the same material but need to make it sound fresh each and every time.

While the book is fairly solid from start to end, I did get quite a few laughs in one of the final chapters where he discusses things that other people have shared with him about their personal speaking disasters. It was also great to see our ex-RD buddy Scott Hanselman (now a star at Microsoft) quoted in the book.

Recommended!

Book: Pro SQL Server 2008 Service Broker: Klaus Aschenbrenner

I've met Klaus a number of times now and attended a few of his sessions at conferences. Klaus is doing a great job of evangelising Service Broker. I wish the SQL Server team would give it as much love.

Service Broker is a wonderful technology, let down by poor resourcing. Microsoft did an excellent job of building the plumbing for this product in SQL Server 2005 but then provided no management tools and no prescriptive guidance. Everyone then seemed surprized that the takeup of it was slow. I even heard noises questioning it's future a while back and I hope those noises have quietened now. The lack of serious tooling in 2008 was a case of seriously "dropping the ball" regarding the product. It also highlights the other real problem with SSMS in the lack of extensibility. If a supported extensibility model for SSMS was available, others would have stepped up to the plate and we'd have really good Service Broker tooling by now, even when Microsoft hadn't provided it.

Enterprise clients are finally getting their heads around what Service Broker does and are starting to use it, in spite of the lack of resources. I've lost count of the number of sites I've gone into that have a problem that Service Broker would address beautifully but when you suggest it, you get very blank stares back. What makes this worse is that most DBAs aren't very familiar with message-based architectures. Ironically, these sorts of architectures can give them much of the scalability they're trying to get from the product. I constantly go into sites where I see people building an unreliable "house of cards" with bunches of inter-connected systems, the failure of any one of which renders the whole system unusable. Asynchronous processing adds a real new dimension to SQL Server and is such a good solution to so many problems.

This book from Klaus is an update to the 2005 book and does a wonderful job of covering most of what you need to know to work with Service Broker. The book is well-written and builds a good story from beginning to end.

The book isn't perfect. In particular, I noticed a number of places where (I assume) a global replacement of the word 2005 with 2008 caused errors, such as statements about mirroring first appearing in 2008, etc. Also, I'd have to disagree with some of the advice that's outside the realm of Service Broker. For example, Klaus shows how to change the SQL Server service account by using the services snap-in for the MMC. The service account should always be changed from within SQL Server Configuration Manager as it updates ACL's, etc. as well as the logon account.

But these sorts of details are minor and Klaus has done a wonderful job of showing how to use Service Broker and explaining why you'd want to do so.

Recommended!

http://www.amazon.com/Pro-Server-2008-Service-Broker/dp/1590599993/ref=sr_1_3?ie=UTF8&s=books&qid=1269815512&sr=8-3

 

OT: Airlines and Podcasts

Those that know me know that I spend an inordinate amount of time on airlines. I also love podcasts, as you can tell from my www.sqldownunder.com site and show. So anything that combines the two is just awesome.

Fly With Joe fits that perfectly. Joe D'Eon provides great insights in his show. I was sad last year that he hadn't posted many shows. I've also been quiet for a couple of months (but that's about to change with a bunch of SQL Server 2008 R2 shows). But I've been so pleased that Joe's got back into the cockpit on his show lately. And also providing some live streaming shows. Recommended!

On a similar vein, flight attendant Betty has been a long time favourite of mine (and Mai's) with her Betty in the Sky with a Suitcase show. Great to see her posting new shows lately. Recommended also!

Betty also has an upcoming book: http://www.amazon.com/Betty-Sky-Suitcase-Hilarious-Attendant/dp/1606390112 Ordered!

SQL Server 2008 R2 – Application and Multiserver Management Learning Materials

My colleagues and I have been working with Microsoft to produce the Metro training materials for SQL Server 2008 R2. We've using those materials to train other trainers around the world. (If anyone will be in Reading in the UK next week, ping me and say "hi". Same for London the following week).

Roger Doherty's group have been hard at work turning these materials into consumable bite-sized pieces of training. This involves videos, demos and hands-on-labs.

The Application and Multiserver Management learning materials I worked on (often called the DACPAC materials and originally codename Synthesis) are now released (for free) as part of the update to the SQL Server 2008 R2 Training Kit. You'll find details here: http://blogs.msdn.com/rdoherty/archive/2010/03/02/sql-server-2008-r2-update-for-developers-training-kit-march-2010-update.aspx

Enjoy!

SQL Server Reporting Services: Should support include files

It's common to want to embed custom code within reports in Reporting Services. One thing I don't like is the inclusion of anything that looks like business logic directly in the reports. However, formatting functions, etc. seem totally appropriate.

 

If I want to embed custom code within Reporting Services though, I currently have two options. One is to embed the code in the report, the other is to reference an assembly. Each of these has drawbacks.

 

If I embed the code in a report (say just some formatting functions), I have to edit every report if I ever change that code.

 

 The answer has been to use an assembly instead. The downside of placing code in an assembly is that I now have a deployment issue. It's very easy to deploy an RDL report file but quite another thing to deploy a .NET assembly that is referenced by it.

 

What I’d really like to see is an in-between option. I think a good solution would be to allow me to have another node in Solution Explorer, called something like “Shared Code” or some similar name. I could then have sets of code that I might want embedded in various reports. Then in the properties of a report, I could just specify that which of these pieces of code I'd like to have included (and embedded) in the report.

 

This would give me the benefit of only a single place to need to write/update the code but no downside on deployment as the code would simply be embedded in the RDL files.

 

I'd love to hear your thoughts on this. If you like it (or even if you don't), the Connect item is here: https://connect.microsoft.com/SQLServer/feedback/details/534679/reporting-services-should-support-include-files-via-code-inclusions