This is a post in the SQL Interview series. These aren't trick or gotcha questions, they're just questions designed to scope out a candidate's knowledge around SQL Server and Azure SQL Database.
Section: Server Tuning
SQL Server 2008 introduced a server option for optimize for adhoc workloads.
Can you explain the problem that it is designed to solve, what most commonly causes the problem, and what impact this option has when you enable it?
On a SQL Server, you can end up with a large amount of memory being taken up by query plans that will likely never be used again.
There are two primary causes of this situation:
- The less common cause is that you have a large number of adhoc queries being executed as one-off queries.
- The most common cause is that you are using a framework like LINQ that does not handle data typing properly, and causes a large number of query plans to be created for each query. You might also be using lower-level frameworks like ADO.NET incorrectly.
As an example, command objects in ADO.NET have a parameters collection. If you add parameters using the AddWithValue() method, you specify the parameter name, and the value, but you do not specify the data type. The problem with this is that the framework then tries to work out what the data type is, from the value.
This means that if you pass a string like 'Hello', then it might guess nvarchar(5) but if you pass a string like 'Hello There', it might guess nvarchar(11).
The lengths of data types are part of the signature for each query plan. This means that you can easily end up with different query plans for every combination of every length of string that has ever been passed to the query. We call this Plan Cache Pollution.
The correct way to fix this is to avoid ever using methods like AddWithValue() and instead using a method where you specify the data type. The challenge here is that many frameworks like LINQ have this behaviour baked in, and the developer cannot change it.
When the optimize for adhoc workloads option has been set, the first time a query is seen, the hash for the query is stored (so it can be remembered) but the query plan is not stored. The next time the same query is seen, the compiled plan is then stored for reuse.
This avoids the plan cache becoming littered with one-off query plans.
Another option taken by some frameworks, is to just use varchar(4000) or nvarchar(4000), or worse nvarchar(max) for all strings. That will of course avoid the plan cache pollution, but it's likely to then cause issues with memory grants in poor quality plans.