What types of objects are useful in SQL CLR?

I've had a number of people over the years ask about whether or not a particular type of object is a good candidate for SQL CLR integration. The rules that I normally apply are as follows:

Database Object

Transact-SQL

Managed Code

Scalar UDF

Generally poor performance

Good option when limited or no data-access

Table-valued UDF

Good option if data-related

Good option when limited or no data-access

Stored Procedure

Good option

Good option when external access is required or limited data access

DML Trigger

Good option

Rarely a good option as most perform substantial data access

DDL Trigger

OK option if only limited processing of XML EVENTDATA

Good option for extensive processing of XML EVENTDATA

Aggregate

Not possible

Good option

User-defined Data Type

Only alias types

Good option

 Scalar UDFs written in Transact-SQL are well-known for causing performance problems in SQL Server environments. Managed code is often a good option (and generally a much faster option) for implementing scalar UDFs as long as the functions do not depend on heavy data access.Table-valued Functions that are data-oriented are likely to be best implemented in Transact-SQL. A common use case for managed code in table-valued UDFs is for functions that need to access external resources such as the file system, environment variables, registry, etc.

 

Stored procedures have traditionally been written in T-SQL. Most stored procedures should continue to be written in T-SQL.  There are very few good use cases for managed code in stored procedures. The exceptions to this are stored procedures that need to access external resources or perform complex calculations. There should be consideration, however, about whether code that performs these tasks should be implemented within SQL Server at all.

 

Almost all DML triggers are heavily-oriented towards data access and are written in T-SQL. There are very few valid use cases for implementing DML triggers in managed code.

 

DDL triggers are also often data-oriented. Some DDL triggers though need to do extensive XML processing, particularly based on the XML EVENTDATA structure passed to these triggers by SQL Server. The more that extensive XML processing is required, the more likely the DDL trigger would be best implemented in managed code. Managed code would also be a better option if the DDL trigger needed to access external resources but this is rarely a good idea within any form of trigger.

 

Transact-SQL offers no concept of user-defined aggregates. These need to be implemented in managed code. A common use case for user-defined aggregates is to create functionality that exists in other database engines while migrating to SQL Server. For example, if a database engine provides a RANGE function, a replacement could be written in managed code to avoid the need to rewrite the application code. As a further example, the SQL Server Migration Assistant uses managed code to ease Oracle migration. SQL Server 2005 limited user-defined aggregates to those that could be serialized in 8KB of data. SQL Server 2008 changed this limit to 2GB and also introduced the ability to create multi-column aggregates.

 

Transact-SQL offers the ability to create alias data types but these are not really new data types. They are more like subsets (or subclasses) of existing built-in data types. Managed code offers the ability to create entirely new data types and determine not only what data needs to be stored but also the behavior of the data type. It is important to understand however that the intention of user-defined CLR data types is not to convert SQL Server into an object-oriented database. Data types should relate to storage of basic data, not to objects such as employees or customers.

3 thoughts on “What types of objects are useful in SQL CLR?”

  1. Great article!
    I generally agree with you observation. Some personal comments:
    – Scalar UDF is suboptimal (and I still dream of better unmanaged support), performance is not very good, and when you add that the cost of deployment and app-lifecycle is high, I end up with this being a poor choice in most real-world scenarios I have stumbled uppon. For example, the MD5 checksums have much better performance in TSQL, even if you need to put in additional logic with TSQL to solve HASHBYTES 8K limit, NULL-replacements and those kind of things.
    If/when .NET4 gets supported, there might be some more cases for UDF, eg using memory mapped files, LINQ, workflow or other.
    – The CLR aggregates are great and can be used to create advanced functionality where lot's of data needs to be taken into account to create simple results (say for Identity resolution scenarios). The 2G limit is of couse not the real limit, since you have much less memory available inside the CLR in the real world, and scenarios where this is a factor should obviously be solved outside.
    /G

Leave a Reply

Your email address will not be published. Required fields are marked *