One of my colleagues, Itzik Ben-Gan is known to most in the SQL Community, particularly for his passion around the T-SQL language.
He's recently written an amazing blog post that describes why these functions are needed, along with a plea for helping promote the ideas to the Microsoft product group via the Connect site.
Anyone with an interest in T-SQL should read the post as, by itself, it provides great insights into the language and the possibilities that some subtle changes could bring.
You'll find it here: http://www.sqlmag.com/blogs/puzzled-by-t-sql/tabid/1023/entryid/13085/Window-Functions-OVER-Clause-Help-Make-a-Difference.aspx
Highly recommended reading!
With all due respect, OLAP functions were included in SQL-99 standard, and should be attributed to Bob Lyle and others. The following great article was written almost a decade ago:
http://www.ibm.com/developerworks/data/library/techarticle/lyle/0110lyle.html
I agree it would be nice to have them, but do we really need them in each and every RDBMS? Is it not better to choose the right tool for the job? There are quite a few powerful alternatives to OLAP functions in SQL Server.
For example, recently I was trying to speed up a rather smart SQL Server query. Surely OLAP functions would help me out, but I just loaded up my data into a C# module, wrote a 20-line method, and that sped things up 20,000 (twenty thousand) times with very little effort.
In another case, a system was prototyped in SQL Server, but deployed in prod in Oracle.
As far as I recall, Itzik Ben-Gan has been promoting OLAP functions for several years, maybe since DB2 and Oracle implemented them. There are no results so far. Maybe, just maybe, there is a remote possibility that SQL Server just positions itself differently from DB2 and Oracle, and OLAP functions do not really belong in it?
Alex,
I don't really understand your position. You were forced to write a C# module to solve a performance problem — something that's not especially difficult, but not really a natural fit, and which is only required due to the failure of T-SQL to provide adequate means by which to express complex queries. And then you ask whether SQL Server actually needs this functionality. Haven't you already shown that it DOES indeed need the functionality? Why should users be forced to deal with SQLCLR instead of using what is supposed to be the first choice language for the DBMS?
SQL Server positions itself as an enterprise DBMS capable of handling the largest and most complex OLTP, OLAP, and DW scenarios. Storing the data is one piece of the puzzle, retrieving it another, and answering complex queries the final and perhaps most important piece. And here is where T-SQL has managed to lag far behind its competition. There is no question in my mind that this functionality would add a huge amount of value to T-SQL and the SQL Server stack in general. And I don't think it has to do with positioning, but rather with politics within Microsoft. A very sad situation in my opinion.
Adam,
When I work on Windows, my first choice language is C#, especially now, with lambdas and generics and other cool new things. IMO it is just a much better language for most of the problems I typically encounter. More specifically, replacing a verbose monster SQL query with a leaner and more succinct C# method is a very natural choice for me. C# code is typically leaner, cleaner, easier to unit test and to maintain.
As an additional benefit I don't have to worry that at some time the optimizer will fail on me, produce a terrible execution plan, and I will have to re-optimize, retest, and redeploy. With C#, I have written my own execution plan, so plan stability is already there.
So, in this case I don't really need anything, I am all set already.
Regarding whether "SQL Server positions itself as an enterprise DBMS capable of handling the largest and most complex OLTP, OLAP, and DW scenarios", I have no idea of the politics you are speaking about, but I do see what gets done and what does not. Based on that available to outsiders knowledge only, I cannot completely agree with you – I would assume that important things get done, less important don't. So things which do not get done, like the "answering complex queries" you are speaking about, are not really that important, are they?
What do you think?
Hi Alex,
C# is leaner and cleaner? Not sure I agree with that. In some cases, yes, I can express something in C# more succinctly than in T-SQL. But in many other cases that's not the case. Ever tried writing a hash join in C#? It's a lot more lines of code than INNER HASH JOIN, and a lot more likely to contain bugs than using the SQL Server team's implementation.
As to your other point regarding what does and does not get done, let's take a step back. What does "important" mean? Who gets to decide, and based on what metrics? If you ask my customers, "answering complex queries" is of the utmost importance. If you ask some other customers, perhaps not. It's all relative. From where I sit, failing to meet the demands in this area set the product very far back compared with its competitors.
By the way, based on what you've said, I'm not sure why you even bother with SQL Server. If your C# is so fast and solid, why not write a simple ISAM provider and simply use flat files to store your data?
Hey Adam,
What is leaner and what better facilitates code reuse in in the eye of beholder of course. I agree that having SQL server do that hash merge join for you is better than doing it yourself.
I have no idea "Who gets to decide, and based on what metrics", but the requests to fully implement OLAP functions are consistently not accepted, so apparently they are not important to whoever are making the decisions. So we practitioners may spend a little bit of time and request a feature once or twice, but then it is cheaper to just move on and use another tool or solve the problem ourselves.
In practical world, repeating the same thing over and over again and expecting different results is not very wise, is it?
Regarding "why you even bother with SQL Server", it actually is extremely useful for me, with backup, restore, isolation, transactions, ability to execute SQL and so on. In most cases it has all the features I need.
We can compare it to electric wiring, which should be robust and up all the time. It does not have to be able to do fancy things. For many projects all we need is robust storage, backup and recovery, and good processing of basic queries.
I concentrated on SQL Server four years ago. Before that, I worked a lot with Oracle, where I had a lot of practice with the full unleashed power of OLAP functions. Yet in some cases a simple cursor was fast enough, and faster to develop, than an extremely clever solution with OLAP functions. Cursors are very fast in Oracle, you know. Which means that in my limited experience fully implemented OLAP functions were not that much important to me.
So, while it is nice to have OLAP functions, in most cases their presence alone will not motivate us to move over to Oracle.
I tend to think that there are many things I would like to see improved/finished in SQL Server before such niceties as SUM…ORDER BY and LAG/LEAD were added.
My other concern is that the implementation of the suggested windowing extensions would be poor and/or incomplete. The common subexpression spool design used in existing OVER…PARTITION BY plans does not inspire confidence that built-in support necessarily implies good performance.
Hi Alex,
I (and quite a few others) feel very strongly about enhanced windowing support in SQL Server. We also feel that if we raise enough awareness to the topic, and get support from people, there is a chance that Microsoft will add the missing functionality to SQL Server. And that if we won't act, there is a chance Microsoft will not. It could be that such support was not added to SQL Server 2008 partially because the requests were not loud enough and not made enough time prior to the release of the product. It's hard to say. But there is this chance that if we act we can change things… I don't think our requests are ignored; but I do believe that the inclusion of these features is hanging on a thread, so for those who do believe in them, it is important to act now.
I do appreciate your concerns about the deficiency in repetitions, and how unwise it is in the practical world; duly noted. This may surprise you, but despite the inefficiency in the way some of us conduct our affairs, while acting for a cause we believe in in one thread, some of us do also work on and use solutions to the existing problems using existing means in other threads. Such things are not mutually exclusive. But the point is that life could be much, much simpler…
Another word on repetitions…
We probably come from different disciplines. I was taught to never give up on something I strongly believe in. I was also taught that repetition, persistence, and sweat, are the keys to getting things done, to mastering techniques, and (pardon the cliché) to enlightenment. I have to say that I learned that this is very true in SQL.
And let me assure you; if Microsoft won't implement enhanced windowing support in Denali, both myself and others will keep being a pain in the a## until we see such support in the product we care so much about.
Hope you will be convinced as well at some point and join the effort. 🙂
Hi Paul,
I second your "concern is that the implementation of the suggested windowing extensions would be poor and/or incomplete". On top of that, implementing such functionality in C# is usually trivial, so why bother.
Hi Itzik,
I admire your work, love your books, but I cannot agree with you on this. Because I am out there in the trenches, implementing solutions for my customers, it would not be very professional for me to "feel very strongly" about any technology – that would impact my ability to choose what is best for my customers.
Besides, the motivation for "getting things done" and the motivation for "mastering techniques" may contradict each other.
What you are saying about "repetition, persistence, and sweat" rings very true for martial arts, book writing and such, and for my ultra-marathon running as well. For example, I spent a lot of effort getting my book right, and enjoyed this process, and "repetition, persistence, and sweat".
Yet "repetition, persistence, and sweat" might not be very practical for developing solutions for customers – while we are "mastering techniques", our competition may come up with an better way to get things done. While we are perfecting that ultra complex karate move, the competition buy a gun, spend an hour learning how to use it, and we are not competitive any more. While we are mastering the use of brush and watercolors, the competition buy a digital camera. This kind of thing.
Let me site from a well known article by Joel Spolsky: "At Microsoft, if you're the Program Manager working on the Excel macro strategy, even if you've been at the company for less than six months, it doesn't matter – you are the GOD of the Excel macro strategy, and nobody, not even employee number 6, is allowed to get in your way. Period."
The way I interpret that, whoever is working on the strategy, are very well insulated from our feedback. If they disagree with our opinion, there is nothing we can do about it. This is my personal interpretation which may or may not be true of course.
Yet it does not really matter if we agree or not, because we can just go ahead and innovate ourselves. You can develop your own solution. I can do the same. We may merge them if we feel like it, or keep them different if we don't. If I like your solution but want to extend it a little bit, I can just write a few extension methods. A lot of open source innovation works like this.
I'm surprised to find myself ever disagreeing with AlexK. However, I'm unhappy with his idea that we should be obliged to fill in the inadequacies of SQL Server by writing C#, rather than lobby for a fuller implementation of the SQL Standard. I'd rather not start hacking .NET code, and on many sites, wouldn't even be allowed to.
No. The position we want to rectify is that SQL Server has only implemented some of the window functions, but not all. The missing ones will finally lay to rest the last bastion of iterative programming in SQL, the running total, and it will, for the first time, allow us to do statistical calculations. Moving averages can easily be done in SQL Server, but you can only get decent performance by using trick, non-standard, code. These functions also make any operation that requires paging through data in an order you specify, far easier to write and maintain. I agree, these functions take a great deal of skill to implement, but they are already in other comparable RDBMSs.
SQL Standards aren't like a buffet lunch, that you can just pick away at the salami, olives and other bits and pieces that catch the eye. You need to do the lot. You may think of some functions 'Hmm, save a bit of work by leaving that out!', forgetting that, for certain operations, they are extraordinarily useful. I assure you, for certain uses, even LAG and LEAD aren't just 'niceties'.