Plan Cache Pollution: Avoiding it and Fixing it

While SQL Server’s plan cache generally is self-maintaining, poor application coding practices can cause the plan cache to become full of query plans that have only ever been used a single time and that are unlikely to ever be reused. We call this “plan cache pollution”.

Causes

The most common cause of these issues are programming libraries that send multiple variations of a single query. For example, imagine I have a query like:

SELECT c.CustomerID, c.TradingName, c.ContactName, c.PhoneNumber FROM dbo.Customers AS c WHERE c.CustomerID = @CustomerID AND c.BusinessCategory = @BusinessCategory AND c.ContactName LIKE @ContactNameSearch ORDER BY c.CustomerID;

The query has three parameters: @CustomerID, @BusinessCategory, and @ContactNameSearch. If the parameters are always defined with the same data types ie: @BusinessCategory is always nvarchar(35) and so on, then we will normally end up with a single query plan. However, if on one execution the parameter is defined as nvarchar(35), and on the next execution it is defined as nvarchar(20), and on yet another execution it is defined as nvarchar(15), each of these queries will end up with different query plans. A similar problem would also occur if any of the plan-affecting SET options are different on each execution ie: if DATEFORMAT was dmy for one execution, and mdy for the next, you’ll also end up with a different plan.

For more details on the internal causes of this or for a list of plan-affecting SET options, you might want to read the whitepaper that I prepared for the MSDN site. The latest version was for SQL Server 2012 and can be found here: https://msdn.microsoft.com/en-us/library/dn148262.aspx (Plan Caching and Recompilation in SQL Server 2012).

So what on earth would cause someone to send parameters defined differently each time? The worst offenders are not queries that are written intentionally, they are queries written by frameworks.

As an example, while using the SqlCommand object in ADO.NET, it is convenient to use the AddWithValue(parametername, parametervalue) method of the Parameters collection. But notice that when you do this, you do not specify the data type of the parameter. ADO.NET has to derive an appropriate data type based on the data that you have provided. For string parameters, this can be particularly troubling. If the parameter value is initially “hello”, a query plan with an nvarchar parameter length of 5 will be cached after the command is executed. When the query is re-executed with a parameter value of “trouble”, the command will appear to be different as it has an nvarchar parameter with a length of 7.

The more the command is executed, the more the plan cache will become full of plans for different length string parameters. This is particularly troubling for commands with multiple string parameters as plans will end up being stored for all combinations of all lengths of all the parameters. Some later variants of these libraries are improved by always deriving strings as nvarchar(4000). That’s not ideal but it’s much better than the previous mechanism.

While someone coding with ADO.NET can use another method to add a parameter ie: one that allows specifying the data type as well, developers using higher level libraries do not have that option. For example, Lync to SQL uses AddWithValue() within the framework. The user has no control over that. Ad-hoc queries generated by end-user query tools can also cause a similar problem where many combinations of similar queries can end up becoming cached.

Avoiding Plan Cache Pollution

As mentioned, to work around such a problem, the application should use a method to add the parameter that allows specifying the data type precisely.

As an example, nvarchar(100) might be used as the data type for each execution in the above example, if we know that all possible parameter lengths are less than 100.

Treating Plan Cache Pollution

There are several additional options that can help in dealing with plan cache pollution issues:

FORCED PARAMETERIZATION

FORCED PARAMETERIZATION can be set at the database level. SQL Server will often auto-parameterize queries by determining that a value looks like a parameter, even though you didn’t specify it as a parameter. Using the FORCED PARAMETERIZATION setting makes SQL Server become much more aggressive in deciding which queries to auto-parameterize. The down-side of this option is that it could potentially introduce parameter-sensitivity problems. (This option was added in SQL Server 2005).

OPTIMIZE FOR ADHOC WORKLOADS

OPTIMIZE FOR ADHOC WORKLOADS is an sp_configure server level option. When set, SQL Server only caches a plan stub on the first execution of an ad-hoc query. The next time the same query is executed, the full plan is stored. Plan stubs are much smaller than query plans and this option ensures that the plan cache is not filled by query plans that have never been reused. (This option was added in SQL Server 2008). We tend to enable this option on most servers.

DBCC FREESYSTEMCACHE

Sometimes you can get into a situation where you simply cannot avoid the queries from creating this situation and you need to deal with it. DBCC FREESYSTEMCACHE can be used to clear the query cache. One little understood option on it however, is that you can then specify a particular Resource Governor resource pool. It then only clears the plans associated with that resource pool. (This command was first available in SQL Server 2005 but the option to clear a specific resource pool was added in SQL Server 2008).

We often use this method to work around plan cache pollution issues. We try to isolate the badly-behaved applications or ad-hoc queries into one or more separate resource pools using Resource Governor. Then periodically, (perhaps every 5 or 10 minutes), we clear the plan cache for members of this “tough luck” pool.

Best advice is to try to avoid the situation in the first place by appropriate coding techniques but that option isn’t available to everyone.

2015-06-03