Increasingly, developers are using tools that try to automate code generation when dealing with databases. Stored procedures have been a thorn in the side of this. Mostly that's because it's difficult to obtain the metadata that is really needed.
RowSets
Most automated tools try to work out what rowsets can come back from the sprocs. The Entity Framework, LINQ to SQL and many others use SET FMTONLY ON to try to determine what might come back from the sproc. This is a flawed mechanism as it returns one set of metadata for every potential code path through the sproc. It really only works for the most trivial sprocs ie: single statements that perform basic CRUD operations.
The first thing that is needed is a way to express the sets of rows that might be returned. This should be part of the definition of the sproc. It should include the ability to express multiple sets of rows. Some tools only work with the first rowset but that isn't sufficient because it has become quite common for people to build sprocs that return multiple rowsets to avoid network round trips. I should be able to say something like:
1 2 3 4 5 |
CREATE OR ALTER PROCEDURE SomeSchema.SomeSproc (someparameters) WITH ROWS OrderHeaders(SomeColumn INT, SomeOtherColumn NVARCHAR(50) NULL), OrderDetails(AnotherColumn INT, YetAnotherColumn INT, EvenYetAnotherColumn GEOGRAPHY), EXECUTE AS (execution options here if needed) |
Note that I think the rowsets should be able to be named and I also threw in the need for CREATE OR ALTER. Please, please, please can we have this !!!
Exceptions
For a client to work effectively with a server, it should have knowledge of the potential exceptions that might be thrown by that code. This should also be available in the metadata of the sproc. I think it could be included like this:
1 2 3 4 5 6 7 |
CREATE OR ALTER PROCEDURE SomeSchema.SomeSproc (someparameters) WITH ROWS OrderHeaders(SomeColumn INT, SomeOtherColumn NVARCHAR(50) NULL), OrderDetails(AnotherColumn INT, YetAnotherColumn INT, EvenYetAnotherColumn GEOGRAPHY), EXCEPTIONS NoSuchCustomer(50020,’No such Customer’), DuplicateOrder(50022,’That order already exists’), EXECUTE AS (execution options here if needed) |
Clearly, other exceptions could occur in an unplanned way but client code should be able to be configured automatically to deal with potentially expected errors. For example, a code generation tool could automatically build a skeleton error-handling routine for errors that it already knows could exist.
Contract
Clearly, this is all then heading towards having a contract for a sproc. When you have multiple people (or worse multiple teams of people) building parts of an application, it is really important to have a contract at the interface point. Perhaps the contract itself should have a name ie: something like:
1 2 3 4 5 6 7 8 |
CREATE OR ALTER PROCEDURE SomeSchema.SomeSproc (someparameters) WITH CONTRACT SalesOrderHeaderAndDetails (ROWS OrderHeaders(SomeColumn INT, SomeOtherColumn NVARCHAR(50) NULL), OrderDetails(AnotherColumn INT, YetAnotherColumn INT, EvenYetAnotherColumn GEOGRAPHY), EXCEPTIONS NoSuchCustomer(50020,’No such Customer’), DuplicateOrder(50022,’That order already exists’)), EXECUTE AS (execution options here if needed) |
Enforcement in TSQL
I'd love to have a situation where some of this is enforced as much as possible within T-SQL. For example, given the declarations above, I'd love to see a situation where a SELECT statement that doesn't match one of the rowsets in the metadata or a RAISERROR with an error number not in the list of declared exceptions was enough to make the creation or alteration of the sproc fail. Perhaps any sproc with a declared CONTRACT could have these sorts of restrictions or we could have a word like ENFORCED:
1 2 3 4 5 6 7 8 |
CREATE OR ALTER PROCEDURE SomeSchema.SomeSproc (someparameters) WITH CONTRACT SalesOrderHeaderAndDetails ENFORCED (ROWS OrderHeaders(SomeColumn INT, SomeOtherColumn NVARCHAR(50) NULL), OrderDetails(AnotherColumn INT, YetAnotherColumn INT, EvenYetAnotherColumn GEOGRAPHY), EXCEPTIONS NoSuchCustomer(50020,’No such Customer’), DuplicateOrder(50022,’That order already exists’)), EXECUTE AS (execution options here if needed) |
In the feedback, I was asked for a Connect item. Connect has now retired. At the new feedback site, the idea is here. https://feedback.azure.com/d365community/idea/3b9b8fea-e74f-ec11-a819-0022484bf651
Hi Greg,
I like this idea *a lot*. I've seen pieces of it in other database engines. One advantage of these features: they would reduce the friction between the database and application tiers. Very cool.
:{> Andy
Really great idea Greg. I know that others have asked for the Create or Alter. Here is the Active Connect item, https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=127219.
The Contract or Rowset idea is great. You can do this with VIEWS and CTE's so why not stored procedures?
I think that enabling TVPs for OUTPUT would get us most of the way there in a reasonable fashion. Then you could name your outputs and would be good to go on that part. Some games could probably be played so that if the caller is an app rather than another stored proc, the data would be streamed back instead of actually put into an object in tempdb (at least, in some cases–there would be cases where this would be impossible). Lots of potential for that feature to be leveraged for contracts. Now we just need to get the attention of the right people at MS. Not an easy task…
But using TVPs would require instantiating the data in the temp object first (wasting precious RAM/disk resources) as opposed to simply streaming the data back in TDS directly. Seems like they could be able to 'know' that the TVP was simply being populated to be streamed back and not used subsequently in the sproc as an effective workaround to accomplish your objective though.
Oh god yes, I've been bleating about this criminally overlooked deficiency in sprocs for years without ever making any headway. Contracts are vital for SSIS which totally relies on being able to interrogate for metadata.
The only trouble is that sprocs would still need to be backward compatible thus implementing this would further fragment an already fragmented language. This is part of the reason why I often suggest its time for a new language.
Do you have a Connect submission for this Greg?
-Jamie
Hi Jamie,
Yes, I've just added one: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=525653
Regards,
Greg
How would you handle when the results change based on input parameters? Ex: create proc myproc @choice bit = 0 as if @choice = 0 select a from t else select b from t ?
It's like you cobbled together the ideas in my mind and turned them into concise, useful suggestions. Are you available for parties? 🙂
Excellent idea!
Entity Framework 4 is proving to be a thorn in my side and it is a very difficult choice between hand crafting the details of each recordset returned by every stored procedure or changing every stored procedure to return one row when provided with null parameters.
However, we may be looking at this problem the wrong way. I wonder whether these access layer technologies could prompt for test values in the parameters rather than just passing nulls. This would trigger a recordset and get the output column details. Otherwise we will be adding additional information to each and every stored procedure unnecessarily.
Hi Jon,
In that case, I'd suggest that two potential rows would be defined. I'm not suggesting that every execution would produce all rows defined, just that rows output should match one of the definitions.
Regards,
Greg
C Rummel,
Good point. Perhaps we should be more strict as to what people can actually DO with a sproc and perhaps returning multiple resultsets shouldn't be one of them!
After all, a contract is inherently implying some scrictness to the sproc – why not take it a step further and be more strict about what sproc can actually do.
-Jamie
One other thought….if we were to impose contracts on the sproc resultset(s) why shouldn't the same be true of the return value?
Could we have sproc return types that are something other than integers?
Could we have sproc return types that are constrained (either by DDL or entries in a table column)?
Just a thought!
Good point Jamie but I think having other return types would be a truly breaking change. I was trying to propose methods that wouldn't break existing code at all.
Perhaps though, we should have something in the contract that says whether or not a return value should be returned. And if none is meant to be returned, a statement that returns a value would be a violation of the ENFORCE option. Conversely, if a value is meant to be returned, the lack of at least one RETURN statement would be an error.
The result shape should NEVER change based on input parameters. Polymorphic result shape is a violation of the ideals of design by contract, and in most languages it's not even possible to accomplish.
Adam,
Yes indeed but unfortunately while T-SQL allows it devs will continue to do it.
You're right that other languages don't allow it – so perhaps we need a new language??? (sorry, broken record here)
Greg,
Yes, specifying the existence (or not) of a return value sounds like a step in the right direction.
I suspect this subject might arise in a certain heated discussion in about 4 weeks time 🙂
-Jamie
Me again (sorry, this topic is just too interesting to me)
By sheer coincidence one of my colleagues has, just this morning, encountered this problem. He put an ELSE clause onto an existing IF…THEN in a sproc; that ELSE clause returned a NULL.
From that moment forth the SSRS report that consumed that sproc "hung" indefinately and it turned out that he had forgotten to cast the NULL as the expected return type – as soon as he corrected the code then the report executed instantly (and successfully).
Its almost poetic that this occurred just a few hours after you post this blog entry Greg 🙂
When we discussed this he came up with exactly the same suggestion as Adam – OUTPUT TVPs.
-Jamie
There is a Connect request here for output TVPs: https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=299296#tabs
Jamie, why would you need something other than integers for return types? Isn't this what output parameters are for? You shouldn't be using return values to return *data*, IMHO.
As for the idea in general, I like it and support it, however it is easy to see that at the engine level there is little interest in enhancing metadata capabilities. For example, we *still* can't tell, without parsing procedure text directly, what the default value for a stored procedure parameter is.
I think an out TVP as Adam mentions is the most likely path we'll get to achieve this, however it means that the consuming languages will either need to be tricked into believing this is a "normal" resultset or updated to adapt.
C Rummel,
Maybe in those cases you should consider different stored procedures. Not only do you avoid a changing result shape, the optimizer can also deal with two separate procedures instead of constantly switching plans due to different inputs.
I agree with all of you almost completely in principle, so maybe I can ask for some pointers on how to turn that ideal into reality. I'm reminded of what I heard Kimberly Tripp refer in a session I attended a few years ago as "swiss army procedures" whose specific intentions were exactly to be polymorphic, under the guise of trying to reduce the total size of the result both by rows and columns – why return extra stuff the user doesn't want to see? – and yet supply some kind of encapsulation.
So if procedures aren't supposed to change result shape, yet requirements say users have to control shape by being able to choose from a list of columns in any combination, then we're saying you either can't use procedures or you have to code separate procs for all allowable permutations and then the app chooses which to use up front, right?
I've seen these all-in-one types of procedures before and would love to find a better way. I'll try to dig out my notes from that session again, but any additional input is welcome.
Just reread my last comment – wanted to make sure it was clear that Kimberly Tripp was *NOT* advocating those "swiss army procs", quite the opposite. I just thought she had a good name for them.
C Rummel,
With increasing network capacity between app server and database, and the availability of more complex client-side controls, Ajax, JSON etc. I've found in our development that it is less of a priority to filter columns out from resultsets. Give the client the whole payload, and if they don't want to see column 'x', then the client can hide that column. Filtering *rows* is typically much more effective, so we make them search or otherwise narrow their resultset. Our relative cost of slightly more complex web development is much lower than our relative cost of slightly more complex database development.
WELL SAID!!!
I like the contract.
One thing I have done is to generate unit tests for parameters and result sets in stored procedures. This is not the equivalent of a contract, however, using rhino mocks in the C# data acess layer I can establish a unit test to that mock. I can then use the same mock to test my sp inputs and output to confirm both match.
Because I use functions and stored procedures a lot, I like using NUnit to test them. Having started down the path of testing I have found my procedures are becoming more granular; one result set, etc.
If I have a lot of work I do to setup results in the SP I try to turn that into a function that can be tested independently. Then, another function or stored procedure for each result set. Finally, if I need a single round trip, I might put together a container SP to call each individual piece.
With SQL 2k8, and the ability to pass sets as parameters, much of the fiddling should go away.
I think a little more round trips is better for realiable, maintainable, sustainable, testable code. Especially in todays world of high speed networks, etc.
These are just some practical methods I have found that work with the current state of things in SQL Server. It works with other DB Engines as well.
My 2 Cents
Aaron,
Thanks for your response, I appreciate it. I'll try to keep myself open to just returning all the columns users can pick from in the future, but I still get concerned about returning hundreds of columns back to the app when typically the user is only looking at 10 – 15 columns at a time, it seems such a waste of bandwidth.
Hundreds of columns? I'd suggest you have more important problems than bandwidth. 🙂
I just can't wait (*grin*) to work with databases that have tens of thousands of columns. The need to increase the maximum number of columns to 30,000 in SQL Server 2008 RTM from the already-high limit of 10,000 in the CTPs beggars belief.
Yeah, I'm sure I do. But to give you an idea where I'm coming from, if you've looked at corporate financial statements before (cash flow statements, balance sheets, and income statements) there are a lot of line items that go back 10 years each, and the requirements are to denormalize all those values into a single row per company for display. So where you'd normally have fewer columns with a row per company per year, they need those pivoted out. It gets ugly fast.
C Rummel,
I will admit that I haven't used PIVOT very often in T-SQL. Like my belief that RETURN values should be restricted to indicating execution status, I also believe that (where possible) pivoting is the job of the client. As more and more capabilities arise, and more sophisticated report builders are developed, I am less and less inclined to make it the database's responsibility to produce a report in exactly the format and layout that a visual report requires. If this is what we should always do then we would still be building web sites with brain dead, classic ASP front ends to display the complicated database reports.
Though I do agree that there is likely tons of legacy code that will continue to do things like you describe, so enforcing a contract across the board will not be possible for many versions to come.
Hi Aaron,
many clients do not pivot efficiently. Also, PIVOting dense data may speed up your queires:
http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/11/20/pivoting-dense-data-may-speed-up-your-queires.aspx
Alex,
Granted, but clients are much easier to switch out than back end architecture… if your client is bad at pivoting, try a different client… surely someone else has done it better. You may end up with PIVOT on the server side, and I didn't mean to imply that there is no place for PIVOT in T-SQL. What I meant was that, particularly in C Rummel's case, where he is using pivot and dynamic SQL to return a dynamically shaped result set, it would never fit into Greg's ideal "contract" world. It's more the dynamic SQL part of it that is the problem of course, but PIVOT is not a completely flexible and perfect solution in the best case, never mind one which must adhere to contract rules.
This is good guys, but I've already utilize such concept of contracts with my SPs since SQL 2005. This is in form of registering an XML schema instance with SQL server and then assign the result set into a so called typed XML variable. This works like a charm for both input and output. That kicks back immediately any invalid input data without even executing a SProc code. And yes, I no more communicate with the db issuing any T-SQL commands nor receiving typical rowsets in a traditional manner (IMHO it's archaic). It's bad since does not allow hierarchical datasets and results in chatty style of communication. Brrr… people still do it… multiple output result sets from a given SP. Why?
All my contracts are the XSD in SQL Server. It's very easy to pull what XSDs (with full definition) are associated with particular SP using DMVs. Great for generating documentation on the fly. I can understand any system written in this style in minutes. Also, I do not have my system bloated with 4 CRUD procedures per every table. I have a fully business (domain) centric design. One SP per business transaction, some tables even do not allow delete to anyone thus no deletion SP exists for those. SP may itself insert into 20 tables, application does not need to know any of that logic and it simplifies the app design to a magnitude of order less. Many people call this a business logic and tend to put it outside in an application code with help of ORM, but they might not be aware about difference between persitence logic versus business logic. My SP more or less contain persistence logic following a rule "Perform this closer where the data are". Also, SQL guarantees static compilation errors if you try code against something which does not exists in the schema, so no worries. People who are afraid of writing SQL and SP calling it ancient language wants the ORM while it really does not help. Any change to db breaks your app and many times it's not as simple as renaming few columns when the association tables gets added or columns got removed. People do not understand the SPs are not primarily for performance, but for encapsuation of db design. Who does not get it I say I treat my db as a service repository like Amazon or Twitter. Do we care how it stores the data? We just call their APIs. All my SPs are service methods associated with contracts. If someone still does not get it and asks why treat db as web services (per my statement "application should be dummy and know knothing about a db (design)") I do refer to an example of a SOA design. Think of why SOA has been invented? To abstract internals of the other system or layer. To make design and consuming simpler.
While your idea is great, I'm not sure if it's worth for MS to enhance it the way you desire causing others to just write more and more code within SP declaration (and maintenance of changes). Again, the contract problem is solved for at least me already and it's so beautiful to communicate with the db sending pure objects only (serialize and deserialize in a app).
It's a great abstraction over a physical db design and structural changes. It's XXI century and I really believe it's time for most people to switch to rich datasets exchanged in form of strogly typed and validated XML. To validate outgoing result I compose my result set as an XML and assign it to a typed variable (tied to a schema) which instantly validates the result is according to the contract. That gives even more business centric validation of transaction performed as expected. If my SP is supposed to create two offsetting transactions (use case: reversal) I can indicate this in a schema that "Summary/PaymentRecord" node must appear exactly twice in it and possibly the second amount must be negative, so if for any reason one insert would not be successfull or the other developer forgot to write it that would not be let gone by SQL. This is an added value instead of just checking for @@ROWCOUNT after inserting into any table and still does not solve the problem of new developer not knowing when he altered something he should insert one more row with negative amount. I simply can return/record a XML summary of a business transaction and that summary also gets validated. If anything would change that would be some form of a higher level guard. This is very agile while I do not like this buzzword personally. Anyway doing it XML schema way you have an added value of documented expected result or summary at the end expressed in form of tangible XSD contract. It introduces less bugs by any new developer possibly unfamiliar with a specific nature of your business.
I understand maybe if you return a million of records really it may nor be suitable (not everything is for everybody), then at least this my technique may be used in variation, use XML as an input and return regular rowset. It all depends on one needs, but perfectly worked in my enterprise system and believe me for any new enhancement or fix request I rarely need to go and mess around the application code. Boosted my productivity to the max since I test only one layer (db), because the contract guarantees the app works with this particular contract already. I can quickly apply the fixes and structure enhancements to my db without touching an app. Everything gets abstracted.
Some people may be concerned about performance, do not ask me, try for yourself and see if that works for you. Just as an example ~300 payment batch posting (moving from one table to the other) done ORM way went down from 1-2 minutes down to under 1 second using all logic in SP. But what about db/app developer performance? The fixing or enhancement maintenance time in my team could got down even to a couple of hours instead of weeks to recompile an app, retest whole thing, push app code through the staging server and wait for next app build and then publish cycle and then pray the result set matches to what the app expects. It's horrible. For those who want to continue that way good luck. I'm just fixing my SP or tables, test these and voilaaa!
Wow! Rajmund Rzepecki: That is an absolutely ingenious idea. I suppose if you wanted to return multiple result sets you could either add it to the XSD or just output multiple xml variables.
Also, I'll bet it makes building an auditing system a breeze as well: You can store in your auditing table the XML request exactly as it was received, and even (in some cases) save the response XML as well (perhaps so that the operation could be reversed, if necessary). Truly brilliant!
I believe you when you say XML serialization is much faster than an ORM system, but have you benchmarked performance differences between returning a set of rows and returning an XML document? I know the XML operations are very fast (especially with an XSD) in SQL server, and serializing and deserializing XML in .NET code is very fast as well, but I imagine there must be some performance hit from this.
Thank you for the question. I have not performed any official test exactly comparing a rowset to its counterpart in XML since in the complex system it is not trivial. I would say at certain level these two comparisons may be incompatible to a degree the the results yielded are not meaningfull for me anymore that's why most people limited their testing only to plain vanilla performance of XML datatype and/or versus a varchar. But, let me go on, I do not escape from answering your question, at least not yet 😉 Instead I will point you to several other articles where people actually did such comparisons and had few other interesting findings. There is no reason I would not believe them. And for me the simplicity and full control of the design and reducing further maintenance was much more important contrary to many people not seeing this benefit in a straight way. My example of performance down from minute to a second was referring more to encapsulating all the logic in a SP versus ORM, that was more for ORM people to open their eyes than have something to do with XML in SQL Server 🙂
Before I give you the links to performance tests in the next post I must say that I already use the XML style for auditing both input and output (aka service log). It's then easy for web developer to pull it out in some form of a history grid and without any further processing on database side (retrieve XML string as is), the web dev can format those details in the app any way using XSL or connect as a XMLDataSource for ASP.NET to a GridView or TreeView.
Also, I plan to work on generating such XML for some critical tables where edits to individual rows must be logged. I could generate XML from "deleted" and "inserted" in a trigger and optimize the output to have nicely laid out previous and new values in the same node to serve as a hardcopy. That XML storage saves me a great way of re-defining all columns duplicated by two (old, new value) like a mirror. Also, my db design for temporal data utilizes more and more XML. Why? Because for the data which are identical, but not treated by users as official yet (unposted payments or audit trail summary) I do not need to rebuild whole 3NF and FKs for all temporal tables which may count let's say 20. Imagine there is a full 3NF for official transactional data already, then for the sake of God why I would need to create 20 other tables to serve the same, but temporal data? Those are validated against a schema when persisted and during posting later these will be validated again against true 3NF (like SET XACT_ABORT ON).
Least, but not last the beauty of this XSD way it generates C#/VB code for me. I do not write DTO/POCO classes anymore by hand. I just have run all XSD files from command line (xsd.exe, or XSDObjGen) and use them straight in the app. This way I have no more need to utilize any 3rd party so called code generators which supposedly traverse all your tables, views, SPs for minutes or hours in a db and retrieve SQL meta data to give you the class. No need to, as I do not want any table/column residue in the app anymore. No more ties.
My DAL (Data Access Layer) is greatly simplied, again no need to buy or use any ORM which claim do it for you, but instead they glue together all the generated table classes with their and only their functionality in the DAL since every class must have some notation where it comes from (they usually use .NET class attributes). My generated classes are plain simple, but there are own partial classes enhancing them by functionality needed (eg. I write "SavePayment"). The simplicity of DAL is in the form or having one agreed style for all my SProcs – these all do accept one or two params ("@InputData" always XML) and return a scalar value of XML.
This way my DAL is generic, I have mostly only one method in all DAL like "CallMySuperProcedure(string name, obj o)" and it knows it accepts one common parameter named "@InputXML", so it serializes the generic input object and pass into the proc as XML. Again, no need for code generator to figure out params to every sproc. The second param depending on individual audit trail design is the userName, but it also can be included in XSD, whatever fits you better. Do we want userName in very business class, maybe?
Last, when I see the system with SPs which have 50 parameters or people put in comma separated string to represent master/detail of products selected in an order that instantly tells me something is seriously wrong with the design. 10 years ago I was making arguments against using XML anywhere even in web development and WSDL web services as I did believe the machines (network and computing power) were simply not ready for that. I preferred to use my own handcrafted TCP communication over sockets. But we are in XXI century already and it's a completely different story.
You really could return everything you want in one XML it all data relate for the same purpose or two if the second is an audit trail for example. One can make schema really conscise like ProductName will be PN with id as an attribute, declare certain nodes as optional. If one really wants can make XML having not too much overhead comparing to TDS. The benefit is when you send hierarchical data, in the alternate rowset in TDS you have to send many time composite key or ID of a parent to relate the data. It wastes the stream space same as others say XML. Most of the system talk db <-> web server, so XML is not being sent "across the Internet wire", it's not vulnerable to Internet latency. Do people who hate XML, because it's too verbose really evolving? I did…
Ok, please refer to several acticles below. Interesting is one on 15seconds.com which tests XQuery performance (it is part 3) in an UPDATE statement, you can navigate to a second part of the article which overviews an INSERT. There is one interesting comparing XML data type versus a varchar and as an example benefit read on the better way to split (decompose) comma separated input for those who did use in the past using XML. That is the killer, avoid loops, cursors or recursive CTE's. I recommend glance over all the links I've given below. And the benefit of using XML instead of varchar, sometime it may be better to aggregate many strings into an XML and then process it.
http://www.sqlservercentral.com/articles/XML/63633/
http://www.sqlservercentral.com/articles/XML/66932/
http://www.15seconds.com/issue/050818.htm
http://developers.de/blogs/damir_dobric/archive/2007/08/05/performance-comparison-of-most-popular-serializes.aspx
http://technet.microsoft.com/en-us/library/ms345118(SQL.90).aspx
http://searchsqlserver.techtarget.com/tip/0,289483,sid87_gci1265579,00.html
http://searchsqlserver.techtarget.com/tip/0,289483,sid87_gci1296396,00.html
http://stackoverflow.com/questions/61233/the-best-way-to-shred-xml-data-into-sql-server-database-columns
Rajmund:
Thanks so much for your detailed response! I'll check out those links as soon as I get the chance.