Saturday, March 18, 2006

SQL Server procedure cache weirdness

We have had a ongoing problem where a sqlserver instance logging web activity and running reports takes 100% of CPU. This is a dual xeon CPU box, with each xeon hyperthreaded which is almost equivalent to 4 CPUs.

Yesterday I had the luxury of looking at everything about this problem as it was occurring without the need to return the system to a more stable state rapidly.
I looked at everything, and there was no perfmon counter and no bad waits/blockers in the instance. Eventually I removed all the sessions which were consuming CPU without any response and tried to rerun the stored procedure they were calling.
That proceeded to hang as well. The normal duration of this call is 4-6 secs. For a report that is not too bad given the size of the tables.
Given that the procedure should have worked, that suggested there was a some problem with the stored procedure. I extracted the SQL which is run and executed it. The SQL worked and proceeded to return data (without hanging).
This suggested that there was something wrong with the actual calling or executing of the stored procedure.
Background, SQLserver parses and stores the stored procedure and the determined execution plan in memory in an area called the procedure cache. Given this stored procedure had been parsed in the past, this potentially meant there was a problem with the cached stored procedure and most likely the execution plan.

So I executed the stored procedure with the recompile option.
eg. EXEC usp_report_this WITH RECOMPILE

The result was the stored procedure now worked and return a result within the expected duration. When I removed the recompile option, the call proceeded to hang again. This confirmed that the stored procedure in the procedure cache was corrupt in some form.
So to get rid of the problem, I used DBCC FREEPROCCACHE to flush the procedure cache of all cached stored procedures and execution plans.
The retest of the call with the recompile option now worked as well.

Given I am still unable to determine why the procedure cache was corrupted or how that occurs we recreated the stored procedure and added the WITH RECOMPILE option, so that all future calls of that stored procedure will be recompiled.
eg.
create procedure usp_fred
declare ...
WITH RECOMPILE
as

Given the reason for the corrupt stored procedure is unknown at this point and searches on Microsoft and google didn't reveal anything meaningful. If I find out the reason I will update this blog entry or produce a new one with a more thorough explanation.

Adding the WITH RECOMPILE option is something which should only be used as a last resort. As the parsing and determining execution plan part of the execution of SQL or stored procedures can add significant CPU overhead. We are lucky the frequency of these reports is low enough not to be a significant problem.

Have Fun

No comments: