I've had a number of people over the years ask about whether or not a particular type of object is a good candidate for SQL CLR integration. The rules that I normally apply are as follows:
Database Object | Transact-SQL | Managed Code |
Scalar UDF | Generally poor performance | Good option when limited or no data-access |
Table-valued UDF | Good option if data-related | Good option when limited or no data-access |
Stored Procedure | Good option | Good option when external access is required or limited data access |
DML Trigger | Good option | Rarely a good option as most perform substantial data access |
DDL Trigger | OK option if only limited processing of XML EVENTDATA | Good option for extensive processing of XML EVENTDATA |
Aggregate | Not possible | Good option |
User-defined Data Type | Only alias types | Good option |
Stored procedures have traditionally been written in T-SQL. Most stored procedures should continue to be written in T-SQL. There are very few good use cases for managed code in stored procedures. The exceptions to this are stored procedures that need to access external resources or perform complex calculations. There should be consideration, however, about whether code that performs these tasks should be implemented within SQL Server at all.
Almost all DML triggers are heavily-oriented towards data access and are written in T-SQL. There are very few valid use cases for implementing DML triggers in managed code.
DDL triggers are also often data-oriented. Some DDL triggers though need to do extensive XML processing, particularly based on the XML EVENTDATA structure passed to these triggers by SQL Server. The more that extensive XML processing is required, the more likely the DDL trigger would be best implemented in managed code. Managed code would also be a better option if the DDL trigger needed to access external resources but this is rarely a good idea within any form of trigger.
Transact-SQL offers no concept of user-defined aggregates. These need to be implemented in managed code. A common use case for user-defined aggregates is to create functionality that exists in other database engines while migrating to SQL Server. For example, if a database engine provides a RANGE function, a replacement could be written in managed code to avoid the need to rewrite the application code. As a further example, the SQL Server Migration Assistant uses managed code to ease Oracle migration. SQL Server 2005 limited user-defined aggregates to those that could be serialized in 8KB of data. SQL Server 2008 changed this limit to 2GB and also introduced the ability to create multi-column aggregates.
Transact-SQL offers the ability to create alias data types but these are not really new data types. They are more like subsets (or subclasses) of existing built-in data types. Managed code offers the ability to create entirely new data types and determine not only what data needs to be stored but also the behavior of the data type. It is important to understand however that the intention of user-defined CLR data types is not to convert SQL Server into an object-oriented database. Data types should relate to storage of basic data, not to objects such as employees or customers.
Great article!
I generally agree with you observation. Some personal comments:
– Scalar UDF is suboptimal (and I still dream of better unmanaged support), performance is not very good, and when you add that the cost of deployment and app-lifecycle is high, I end up with this being a poor choice in most real-world scenarios I have stumbled uppon. For example, the MD5 checksums have much better performance in TSQL, even if you need to put in additional logic with TSQL to solve HASHBYTES 8K limit, NULL-replacements and those kind of things.
If/when .NET4 gets supported, there might be some more cases for UDF, eg using memory mapped files, LINQ, workflow or other.
– The CLR aggregates are great and can be used to create advanced functionality where lot's of data needs to be taken into account to create simple results (say for Identity resolution scenarios). The 2G limit is of couse not the real limit, since you have much less memory available inside the CLR in the real world, and scenarios where this is a factor should obviously be solved outside.
/G
Very true, I used Scalar UDF when I had to calculate the check-digit and verify the ISBN check-digit.
Great post, Greg. My students will definitely benefit from it.