SQL Cache Dependencies

A little while back I wrote an article on SQL Cache Dependencies. I had a colleague ask about these today so I think it's time to blog it. (For more information, I also did a podcast some time back with my old mates Wally McClure and Paul Glavich from the ASP.NET Podcast. It's at: http://weblogs.asp.net/pglavich/archive/2005/08/21/423160.aspx)

SQL Dependencies in ASP.NET v2

 

Traditional ASP Applications

In traditional ASP applications, every time the user would request a page, the application would retrieve the data from the database to render the page. Clearly this doesn’t scale very well. It works for a small number of users but not a large number. Often though, this data access can be completely avoided for certain types of data.

Consider the Production.ProductCategory table in the AdventureWorks database. Realistically, how often would it change? If we are building an application that displays the list of product categories, it doesn’t make sense to retrieve it from the database on every request. We could improve on this situation if the same user was retrieving this list regularly by holding it in the user’s Session state but that also wasn’t that helpful if there were a large number of users as many copies of the same data would be held.

ASP.NETCache Object

A key advance in ASP.NET was the introduction of the Cache object. It provided a server-wide shared cache that could be used to hold relatively static data. We could read the list of product categories into a dataset and hold it in the Cache.

The issue with any cached data though is when to refresh the data in the cache. If we choose to refresh it once per day, we then could be showing an incorrect list for up to an entire day. If we base the refresh policy on elapsed time, we have a trade-off. The shorter the refresh time, the fresher the data but the less useful the cache is. The longer the refresh interval, the more efficient the cache becomes but the staler the data potentially is.

The ASP.NET Cache object in v1 gave us two options for cache invalidation. One was time based, the other was based on a file dependency. This was originally designed to allow for situations where the data was coming from an XML file stored in the filesystem. A filesystem watcher was set up to monitor the file. If the file had changed, the Cache would be invalidated and refreshed.

But the invalidation option that we really wanted was “when the data changes”.

Cache Invalidation Prior To ASP.NET v2

Prior to ASP.NET v2, there were a number of attempts to provide cache invalidation when the data changed. The most popular version was:

1.       Create an extended stored procedure in SQL Server that simply touched a nominated file in the filesystem.

2.       Add a trigger (for INSERT, UPDATE and DELETE) on the table to be monitored that called fired the trigger.

3.       Set the cache to have a dependency on that file.

This option was miles ahead of where we started but it had a few downsides. First, writing extended stored procedures for SQL Server was not trivial. It couldn’t be done in managed code but a C++ template was available to use as a starting point. Second, the trigger would fire for any changes to the table, even if our query was only for certain rows in the table. Third, the mechanism was table-based and didn’t deal with queries that involved joins across multiple tables. Last, the mechanism was tied to a specific database engine and often to a specific version of a database engine.

ADO.NETv2 SqlDependency And SQL Server 2005

SQL Server 2005 added a wonderful new feature called Query Notifications. (NOTE: nothing to do with Notification Services – they often get confused). The idea here is that you could say to SQL Server “Here is a query I want you to execute. However, when you finish, I don’t want you to forget the query, I want you to remember it and, if the underlying data that it was based on changes in such a way that *if* I was to re-execute the query I’d get a different response, I want you to tell me about it”.

This is exactly what we need to support our ideal Cache invalidation scenario. We don’t want to re-execute the query, unless the data has changed. You might be thinking “that sounds like a lot of hard work for the database engine”. Well it is a bit of work but it’s in fact utilising the same mechanism that was already in place for detecting data changes underneath Indexed Views in SQL Server 2000. What it does mean is that we have some limitations on the types of query that this will work with:

·         Use two part names for tables (SQL Server needs to know which schema so it knows exactly which table, rather than having to resolve that at query time)

·         Reasonable complexity (not all queries can be implemented)

·         Specific column lists (ie: No SELECT * FROM somewhere)

Normally we SQL Server, we send a query and get a response. We don’t expect unannounced data to arrive sometime later. So the next issue is how SQL Server can send us the notification. To do this it uses another new feature called Service Broker. Service Broker is a transacted queue that lives within the database server. Communication occurs between “services” attached to “endpoints”. We then need to set up an endpoint ready to receive the notification.

The ADO.NET team did a lot of this for us. They created a new object called the SqlDependency. It is a simple programmable object that wraps most of this functionality for us. Basically, it’s used by:

·         Create a SqlCommand you wish to execute

·         Wrap a new SqlDependency around the command (you pass the command to the constructor of the dependency)

·         An event on the SqlDependency object fires when the data changes.

It’s that easy. The one remaining requirement is to set up the queue to listen for the notifications. You do this by calling the Start() method on the SqlDependency object once before you get going.

This new object can be used in any ADO.NET v2 application, including in WinForms applications. However, its primary use-case is in ASP.NET Cache invalidation. You can use it in code like this:

using (SqlConnection conn = new SqlConnection(“some connection string”))

{

      SqlCommand cmd = new

         SqlCommand(“SELECT EmployeeID, EmployeeName FROM Employees”,conn);

      SqlCacheDependency dependency = new SqlCacheDependency(cmd);

      SqlDataAdapter employeesDA = new SqlDataAdapter(cmd);

      DataSet employessDS = new DataSet(“EmployeesDS”);

      employeessDA.Fill(employeesDS,”Employees”);

      Cache.Insert(“emp”, employeesDS, dependency);

}

ASP.NET v2 SqlCacheDependency

Even though the ADO.NET v2 team had done most of the work, the ASP.NET team decided to make it even easier to use with the Cache object. In v2, the Cache object can now have a SqlCacheDependency in addition to the previous options of file and time dependencies. This can be applied to the Cache object and also to other cached objects like user controls. You can specify a directive like:

<%@ OutputCache Duration=”2400” VaryByParam=”none”

               SqlDependency=”northwind:Employees” %>

You can also directly use the SqlDependencies in code:

Cache.Insert(“emp”,obj,

        new SqlCacheDependency(“northwind”,”Employees”));

The code shown is a general form of the SqlCacheDependency settings. In ASP.NET v2, when using SQL Server 2005’s special mechanisms, we use the specific string “CommandNotification” rather than database and table names.

What If I Don’t Have SQL Server 2005?

Rather than tying their implementation directly to SQL Server 2005, the ASP.NET team also decided to create a pluggable architecture so other forms of SqlCacheDependency could be provided. For example, they have provided an implementation for SQL Server 2000. As SQL Server 2000 doesn’t offer query notifications or service broker, the implementation isn’t as good as the one for 2005. It works like this:

·         A table is created to hold details of change notifications. (A command has been provided to help with this: aspnet_regsql.exe –d databasename –ed –E)

·         Triggers are set up on the required tables. (Again a command has been provided to assist: aspnet_regsql.exe –d databasename –et –t tablename –E)

·         ASP.NET polls the change notification table to look for changes.

Obviously polling is not as efficient as being notified but it’s still a whole lot more efficient that having the application constantly looking up the data.

This mechanism can then be extended to other database engines.

LINQ – Lessons that could have been learned from languages like Progress

In a previous post, I mentioned lessons I learned from what I think Progress made mistakes at. However, when looking at the implementation of LINQ in .NET, I think much could have been learned from these types of applications and tools.

At the time I used to use it, Progress applications implicitly has a database context. It's often said that over 90% of applications are data-related. This is the argument for why data libraries shouldn't be an add-on to the language but should be baked in.

I'd go further and suggest that over 90% of the data-related applications operate in the context of a single database. So why don't applications or middle-tier components have the option to specify that database context globally?

I really liked the detailed integration of database objects and language objects in Progress. It was quite seamless. If I wanted to iterate through customers, I could just say:

for each customer:

If I had the equivalent of a where clause, I could just say:

for each customer where somecolumn = something:

To locate a particular customer row, I could just say:

find customer where somecolumn = something.

If there were multiple possible rows, I could say:

find first customer where somecolumn = something.

Where LINQ still feels like an add-on to the language, this is very natural syntax and behaviour. I totally struggle with why there was any need to use SQL-like syntax with LINQ. And given that LINQ is a language extension for querying, unrelated to databases, why not use syntax that's more related to object querying rather than a messed-up variant of SQL?

Another aspect that the Progress language had was default display formats associated with each data type. An entire program (not a great one :-)) could be:

find customer where somecolumn = something.

update customer.name.

These two lines encapsulated the data access, the update logic, the transaction scoping (poor in this case), the entire screen layout and display/update format for the column, etc. I'm not suggesting I want that style of code any more, particularly given application layering and tiers but it gives an indication of the level of abstraction that you could work at. We're still a long way short of that. (People who use Access are probably chuckling at this point).

I suspect we still need a standardised way to store a variety of column and data type related metadata in the database. We can do a lot of that via extended properties but everyone does it in a different way.

Progress DBMS – three lessons for ISV's including collective deafness

Alphatross posted an interesting reply to my blog entry about LINQ and Entity Framework terminology. He asked if I'd worked with the Progress DBMS. I have. Here are my thoughts on it. Obviously others will have a different view of history but I mention Progress over and over again when I'm teaching classes as three examples related to them really hit home for me.

I worked with Progress for many years, starting in about 1983 through to some time in the 90's.

When they started, they wrote a product that just made sense. They didn't care about being compatible with any older code and made an application and language that enabled you to write good code quickly. The thing I love most about the product was its stability. We migrated sites over to Progress and they just went quiet. When you are an ISV, quiet sites are a good thing. For character based RDBMS systems, I still haven't seen anything I prefer for the types of applications we were building.

Then Windows appeared.

Lesson 1:

Companies that have had a good run and good track record tend to build up a collective deafness and eventually seem to have no mechanism to hear it when people say "we don't like your products". Version 6.2 of Progress was wonderful. Version 7 of Progress was horrid. I didn't like it. I told them that. Their marketing folk kept insisting that the lack of new version adoption was due to "you guys not wanting to move to Windows". We kept saying "we are moving to Windows, in fact we've already done so but not with your product". They just didn't *want* to hear the real message and were deafened to that message by their own previous successes.

Lesson 2:

Windows applications need to behave like Windows applications. This seems simple but it's very important. Progress didn't "get" it. They kept producing versions of their applications which were Windows-like but which didn't really behave like Windows applications. As one simple example, if they displayed items in a list box, there would always be a scroll bar, even if there was nothing to scroll. This is very confusing to a user who sees a full list box and a scroll bar and thinks there will be more found by scrolling. I asked why they did this and they told me reasons that had to do with keeping compatibility with their character-based (ie: non-Windows) versions. One key reason for the success of Windows has been the shared knowledge you get from the platform. It minimises the amount you need to learn about new applications as all applications (should!) behave in a similar way. 

This is the problem when you have existing legacy customers. If you endlessly try to keep backwards compatibility, you'll eventually be run over by a new competitor that writes a new application that just makes sense for the current time and couldn't care less about being compatible with your old code.

Lesson 3:

Another thing that drove me crazy with Progress was that I could never keep purchasing the same base license versions. As an ISV, if I did all my building and testing on say version 6.2K, that's the version I want to install at customer sites. I don't want to have my application supplied with 6.2L today or 6.2L+ tomorrow then 6.2M a month later.

For customers using my application, no new feature added to the system is going to change *anything* in a positive way as the application won't have been built to use it. The best I could hope for from a new version is that the application would still work the same way it used to. All other possible outcomes are worse than where I am now, not better, except for the hopefully rare situation where there's an underlying serious problem that I haven't stumbled across in testing.

Software Engineering process matters much more than new versions. And the first rule is that you don't keep randomly changing the underlying platform.

Are certification exams useful for experienced people?

There has been an interesting discussion on an internal Microsoft mailing list regarding the certification process and exams. I've seen posts from many people that are very experienced with products saying they can't see any point in the certification exams and that competencies in the Microsoft Partner program shouldn't be based on exams. They feel these people should somehow just be recognised for their other contributions.

 

Regarding the certification process, I don’t agree that anyone should be just “grandfathered” in. Any of the people that  have a great deal of knowledge and experience should be able to just take the exams and be done with it. I find it really easy to take them at events like TechEd. I typically book in for one each morning at 8am and usually it’s only $50.

 

Alternately, I do them in beta where they are free. The biggest hassle I have with doing the beta exams is it requires me to know where I’ll be on a particular day in a few months time. I usually haven’t got a clue even what city I’ll be in that far out, unless there is an event on.

 

I also question that none of them would get any learning benefit from the process. I often see people that have worked with a product for ten or more years who can’t pass an exam on it. Mostly, that’s because they use 40% of a product every day and the exam covers aspects of the product they never look at.

 

One of the reasons I do take the exams is it makes me study the whole product. Using SQL Server as an example, it helps avoid the problem I see where people use SQL Server 2005 or 2008 the same way they used SQL Server 6.5 or 7. I like getting the prep guides and making sure I’ve covered off all the topics on it. Given I spend so much of my time evangelising topics at the prerelease or early adopter stage, there usually isn’t much in the exams that’s a surprise by the time the product comes out. After I’ve covered the material, doing the exam is then almost an afterthought.

 

However, I don’t consider that having done the exam means much at all. You can buy a complete word-for-word copy of the exams for about $30 out of China if all you want to do is just pass the exams. Until this nonsense stops, the fact that someone has done them means almost nothing on its own, unless they took them in the beta phase where no such cheating options are available.

 

For those that take the exams seriously though, there definitely is value in the preparation process, rather than in actually taking the exam. Once you've done the prep work, you might as well do the exam. An additional benefit is that many people are timeline driven and having an exam booked tends to create a sense of urgency in preparation.

 

Notification Services isn't deprecated in SQL 2k8, it's gone

Darren Neimke posted a link to some great content today from his colleague Steven Nagy. Steven had done a bunch of posts on getting started with Notification Services.

It's a great set of posts but it's important to get the message out that in the next version of the product, Notification Services isn't just deprecated, it's gone. It isn't wise to use it for any new development work unless you're intending to stay on SQL Server 2005 for the life of your application.

For that reason, I've stopped teaching any notification services material in SQL training classes as well.

SQL Down Under show 32 – Dr Michael Rys – SQL Server and XML

Hi Folks,

 

SQL Down Under show 32 with SQL Server product group member Dr Michael Rys is now available for download from www.sqldownunder.com. In this show, Michael discusses why XML matters in SQL Server, provides an overview and tips on the support provided in earlier versions and discusses the changes upcoming in SQL Server 2008.

 

Enjoy!

 

Regards,

 

Greg

LINQ Terminology 101 for DBAs and Others Puzzled

I was responding to a discussion in the SQL MVP newsgroups today about LINQ. A comment was made about understanding of the basic terminology and layers of software. I figured I should blog this as well to help someone.
 
1. LINQ per se is unrelated to databases. It's a language extension that provides an easy to use query syntax for things that are enumerable. You can build a "LINQ to xxx" provider where xxx is almost anything. The thing I find weird about it is that they used SQL keywords as operators at all.
 
2. LINQ to XML is excellent to use compared to working with the XML-based objects that we had to use in .NET before. The latest version of VB is stunningly good at working with XML natively compared to any other language. Bill McCarthy (local excellent VB MVP) has written a really excellent article on this here: http://visualstudiomagazine.com/columns/article.aspx?editorialsid=2421
 
3. LINQ to SQL is what most people are talking about now. The way most people are using it is to build an object model out of their table layouts in a 1 to 1 fashion and only for SQL Server. Sadly, even though the audience is often Enterprise clients and ISVs, this is what most people are shown about LINQ at Microsoft events as it fits well in 7 minute demos. Apart from in the simplest cases, it tends to generate T-SQL that's impenetrable for most humans to debug, particularly those that were trying to avoid T-SQL in the first place. I find great irony lately in every discussion I've seen where people that didn't want to do T-SQL in the first place are pouring over pages and pages of incomprehensible machine-written T-SQL trying to locate performance issues.
 
4. The Entity Framework allows you to have an alternate mapping layer above SQL Server (and potentially other database engines) that you can program to.
 
In the example I've used in the SQL launch materials, I have a Flights table, a Passengers table and a FlightManifests table. FlightManifests is just a simple linking table (many to many) between flights and passengers. In LINQ to SQL, what you'd see is three objects, directly relating to Flights, Passengers and FlightManifests. While this makes perfect sense in the database, it isn't really useful in the application layer. With the Entity Framework, the typical representation of this would only have a Flight object and a Passenger object. The Passenger object would have a Flights collection as one of its properties. The Flights object would have a Passengers collection as one of its properties. This much more closely models the business.
 
But you can go further with this. You can then inherit a new type of object, say "EmployeePassenger" that is a Passenger but has extra properties like an EmployeeNumber. You can also create constrained types such as a FrequentFlier which might be a Passenger that has a FrequentFlierNumber.
 
The Entity Framework then adds extensions to SQL (aka Entity SQL) that are only understood by its provider. So it lets you query the Passengers table where the row is of type FrequentFlier. The upside of this is that you only record the rules for what makes someone a FrequentFlier in one location and you do so declaratively, instead of having to remember to put the right WHERE clause all over the place.
 
"So why wouldn't I do this with views" is the next normal response. Most large databases long outlive the applications that work with them. Often many many applications use the same databases. The traditional approach has been to litter the database with application-specific views and procs.
 
The Entity Framework allows the mappings to live with the application and to be completely different for different applications. For example, I might have a rule that says I don't use abbreviations.I can have "EstimatedTimeOfArrival" in my model even if the DBA prefers "ETA". I probably want singular object names. The DBA might prefer plural table names. etc etc
 
In addition, these mappings can be to/from stored procedures and views, not just to/from tables.
 
5. LINQ to Entities is just the provider to let you use LINQ with the Entity Framework instead of writing Entity SQL.
 
6. ADO.NET Data Services takes this another step further and exposes the Entities (from #4 above) via URLs. (URI's actually) This is considered a logical replacement for native HTTP Endpoints that were in SQL Server 2005. It also provides a much more cross-platform approach.
 
Hope this helps someone. I discussed it all at some length with Pablo Castro in the interview I did a few weeks ago at www.sqldownunder.com. The discussion in the podcast was targetted more at SQL folk than developers.
 

SQL Server 2008 CTP6 hits the streets and you can win an XBox

SQL Server 2008 CTP6 hit the streets today. You can get it here:

http://www.microsoft.com/downloads/details.aspx?familyid=749bd760-f404-4d45-9ac0-d7f1b3ed1053&displaylang=en&tm

It's getting pretty close to feature complete now. Microsoft is keen to have it tested like no other version before. To assist with this, they've introduced a bug bash competition with a bunch of XBox's and other interesting prizes if you help find issues with the CTP. Details are here:

http://connect.microsoft.com/SQLServer/content/content.aspx?ContentID=7734

It's time to go bug-hunting !

TechEd Orlando – 4 sessions and lots of friends there too

I got the good news today that I've had 4 sessions accepted in the first round of approvals for TechEd Orlando. Two are in the developer week and two in the IT Pro week. I'm really looking forward to those two weeks as TechEd's are always great events.

Good to see so many of my colleagues also doing sessions. Peter Myers has two precons together with Douglas McDowell and Craig Utley. Peter also has two sessions in the dev week. Adam Machanic is really excited about his first TechEd session coming up. I just know it'll be great so don't miss it. Fernando Guerrero has an interesting session planned, Andrew Kelly has several as does Itzik Ben-Gan both of whom are always great value. Maciej Pilecki also has a session this year that looks excellent.

Our friends at SQL Skills are also doing a number of sessions. Bob Beauchemin has some really interesting looking ones along with a precon and Kim Tripp and Paul Randall are also doing a double-act at a precon.

And great to see another local Jeff Wharton from WardyIT doing a session, hot after winning his Speaker Idol spot at Barcelona! And recent SQL Down Under guest Matthew Roche is there as well.

And that's just some of the info from the SQL tracks. All up, it should be a great event.

Crocodiles and curiosity

Late last night I watched the end of a wonderful documentary from the BBC called Dragons Alive. Although Mai didn't really want to discuss it at 12:30am, it really got me thinking about the nature of curiosity and its importance in how we advance science and through it, society.

Most awesome advances in science seem to come through people asking seemingly child-like questions. Some of the ones presented at the end of the show were great examples.

One that really struck me was about crocodiles. We all know they live in stagnant water quite happily. How do they live there without getting sick? If you'd have asked me, I would have guessed it was the tough outer skin. But the question that should have been obvious to us all is why don't they get infected and die when they get injured? If we exposed our injuries to stagnant water, it'd probably be the end of us.

There has been lots of work on that over the last few years and it looks like it'll yield antibiotics stronger than anything we've ever seen before.

But the interesting question isn't the specifics of this case, it's why we haven't asked that question before and what we need to do to encourage people (particularly children without preconceptions) to ask "dumb" questions.