LINQ Terminology 101 for DBAs and Others Puzzled

I was responding to a discussion in the SQL MVP newsgroups today about LINQ. A comment was made about understanding of the basic terminology and layers of software. I figured I should blog this as well to help someone.
 
1. LINQ per se is unrelated to databases. It's a language extension that provides an easy to use query syntax for things that are enumerable. You can build a "LINQ to xxx" provider where xxx is almost anything. The thing I find weird about it is that they used SQL keywords as operators at all.
 
2. LINQ to XML is excellent to use compared to working with the XML-based objects that we had to use in .NET before. The latest version of VB is stunningly good at working with XML natively compared to any other language. Bill McCarthy (local excellent VB MVP) has written a really excellent article on this here: http://visualstudiomagazine.com/columns/article.aspx?editorialsid=2421
 
3. LINQ to SQL is what most people are talking about now. The way most people are using it is to build an object model out of their table layouts in a 1 to 1 fashion and only for SQL Server. Sadly, even though the audience is often Enterprise clients and ISVs, this is what most people are shown about LINQ at Microsoft events as it fits well in 7 minute demos. Apart from in the simplest cases, it tends to generate T-SQL that's impenetrable for most humans to debug, particularly those that were trying to avoid T-SQL in the first place. I find great irony lately in every discussion I've seen where people that didn't want to do T-SQL in the first place are pouring over pages and pages of incomprehensible machine-written T-SQL trying to locate performance issues.
 
4. The Entity Framework allows you to have an alternate mapping layer above SQL Server (and potentially other database engines) that you can program to.
 
In the example I've used in the SQL launch materials, I have a Flights table, a Passengers table and a FlightManifests table. FlightManifests is just a simple linking table (many to many) between flights and passengers. In LINQ to SQL, what you'd see is three objects, directly relating to Flights, Passengers and FlightManifests. While this makes perfect sense in the database, it isn't really useful in the application layer. With the Entity Framework, the typical representation of this would only have a Flight object and a Passenger object. The Passenger object would have a Flights collection as one of its properties. The Flights object would have a Passengers collection as one of its properties. This much more closely models the business.
 
But you can go further with this. You can then inherit a new type of object, say "EmployeePassenger" that is a Passenger but has extra properties like an EmployeeNumber. You can also create constrained types such as a FrequentFlier which might be a Passenger that has a FrequentFlierNumber.
 
The Entity Framework then adds extensions to SQL (aka Entity SQL) that are only understood by its provider. So it lets you query the Passengers table where the row is of type FrequentFlier. The upside of this is that you only record the rules for what makes someone a FrequentFlier in one location and you do so declaratively, instead of having to remember to put the right WHERE clause all over the place.
 
"So why wouldn't I do this with views" is the next normal response. Most large databases long outlive the applications that work with them. Often many many applications use the same databases. The traditional approach has been to litter the database with application-specific views and procs.
 
The Entity Framework allows the mappings to live with the application and to be completely different for different applications. For example, I might have a rule that says I don't use abbreviations.I can have "EstimatedTimeOfArrival" in my model even if the DBA prefers "ETA". I probably want singular object names. The DBA might prefer plural table names. etc etc
 
In addition, these mappings can be to/from stored procedures and views, not just to/from tables.
 
5. LINQ to Entities is just the provider to let you use LINQ with the Entity Framework instead of writing Entity SQL.
 
6. ADO.NET Data Services takes this another step further and exposes the Entities (from #4 above) via URLs. (URI's actually) This is considered a logical replacement for native HTTP Endpoints that were in SQL Server 2005. It also provides a much more cross-platform approach.
 
Hope this helps someone. I discussed it all at some length with Pablo Castro in the interview I did a few weeks ago at www.sqldownunder.com. The discussion in the podcast was targetted more at SQL folk than developers.
 

4 thoughts on “LINQ Terminology 101 for DBAs and Others Puzzled”

  1. In the movie 'Dr. StrangeLove' when the Russian ambassador announced they had the utlimate weapon, a doomsday device, the good Dr. replied, "what good is a doomsday device if the world doesn't know about it". One can now ask the same question of MS in regards to linq/ef. MS finds itself between a rock and a hard place, between a blank stare at the very least and mis-understanding at the very most.  They are set to launch a new set of ideas in much the same fashion as one would throw an anvil to a drowning man. The sad fact is way too few have a good understanding of what the hell all this fuss is really about:) MS has done a terrible job in communicating what linq is, what issues in fact it is an answer too.  The sql community, by and large, has shown itself ill prepared to understand linq since it long ago gave up thinking that computer science had any relevancy to the relational model. The prevailing view in MS is that sql is legacy code. So they are throwing it out and with it the relational model!. This is a classic example of throwing the baby out with the bathwater. There is no evidence that there is anyone in MS who knows what the relational model even is. One can only assume that they make the simple minded assumption, like so many others, that sql is the same thing! To say that sql is 'cumbersome' or views 'litter' a database must be their idea of humor. They must explain to users in the clearest language possible why they determined many years ago that sql sucks. MS assumes users really understand the concept of 'impedance mismatch'. Well they do not! MS assumes users understand the difference between the strings of (N)Hibernate and Ruby on Rails vs. the idea of an 'integrated' query language. Well they do not. Do you really think users understand what Jim Gray meant by 'traditional relational databases never being designed to allow for the commingling of data and algorithms.'? And why linq is an answer to this issue? Do sql users really understand why stored procedures are part of the problem in O/R and not part of the solution? The fact is few sql users understand the computer science of net. How can they be expected to understand the concept of enumerable, its significance in representing a sequence let alone the importance of an anonymous type. The whole idea of the significance of  'types' in a query that can be represented by a 'variable' is alien to sql users.
    Writing about linq performance, sql mapping that is 'impenetrable', 'entities' as part of an object model and even the syntax design of linq/esql seems to be putting the cart before the horse. It would be much more appropriate to first write about the basic issues involved, what it means to represent a query within an object oriented runtime environment and why it's an advancement over sql. Why such an environment could just as easily support a 'relational' model.  
    Some thirty years ago IBM concocted System-R, a mess called sql, that is still with us in the form of DB2, Oracle and Sql Server. But then, as now, the relational model was a mystery so users bought the farm. Now history has a chance to repeat itself with an object model. But much less so with informed users, no?

  2. Thanks for throwing this out there, Greg.  I've heard a lot of incorrect first impressions about this technology, and this will help to clear some of those up.

  3. Great article, Greg!  Anyone had experience with the old Progress 4GL language (now OpenEdge ABL I think) ? While Progress' RDBMS and 4GL Language made SQL look like Tutorial-D in terms of not adhering to the Relational Model, it *was* nice to have the database querying language seamlessly integrated with the Procedural code.  I'm more of a DBA than a Coder, so I may be out of order here, but DLinq reminds me of the Progress 4GL in that I believe MS are trying to move away from the current situation where SQL queries\stored proc calls are separated as strings that need to passed to objects within code, making the code look like two languages (e.g. C# + T-SQL) gaffer-taped together.

  4. Are the developers going to be on call when the server spikes due to a bad inline (Linq) sql query that starts being called?
    no. they call the DBA who will be able to figure out pretty quickly through profiler the issue.  and guess what, they won't be able to fix it, since it's hard coded in someone's .net code.
    yes this happens, don't kid yourself.

Leave a Reply

Your email address will not be published. Required fields are marked *