Yes, I know this is a giant (some might say pointless, and I wouldn’t totally disagree) can of worms I’m opening here, but stay with me for a minute.
We were discussing this at work a few months ago, and I was one of those people who followed “conventional wisdom” which says that SQL cursors are evil, and should be replaced with WHILE-loops if a set-based alternative isn’t possible (or practical).
So I thought I’d do some testing for myself. I know this has been done more than once before (I won’t mention any articles specifically, but they’re easy to find using your search engine of choice) – but I like to figure things out for myself rather than read a blog and take the author’s word as infallible. Some of the other tests I found online also only concentrated on one specific metric when measuring performance (e.g. physical I/O), or were too obviously biased for my taste.
Living in the Real World
There are also what I refer to as “non-real-world” arguments against one or the other, for example:
- Cursors are best because they’re easier/quicker to write (usually less variables and/or temp tables required).
- While-loops are best because they’re inherently faster, since they don’t have to make use any of the internal mechanisms that cursors use to allow backwards and forwards data-set traversal.
The reason I call these “non-real-world” problems, is that I don’t think any DBA or developer worth their salt is going to worry too much about how many more characters one takes to type than the other, and I’ve very seldom (if ever) needed to use anything but a simple forward-only cursor – so these problems aren’t representative of the majority of use-cases in my experience.
Obviously if you take something like this super-simplified while-loop below, it would be difficult to write a cursor that would perform faster – but why the hell would you?
DECLARE @i INT = 0; WHILE @i <= 1000000 SET @i += 1; -- 405ms on my laptop
Alright, down to business. Using the AdventureWorks database on my laptop (running 2008R2 build 10.50.2500.0), I wrote a simple test script which iterates through each SalesOrderHeader row (for online orders) and counts the number of SalesOrderDetail rows for each SalesOrderID. I know… for someone who was just talking about “non-real-world” examples, this is pretty unrealistic – but the point is that it generates a measureable amount of I/O and CPU load, and I can reproduce the logic easily using both a while-loop and a cursor.
Here’s the code for the while-loop test:
------------------------------------------------ -- WHILE LOOP (using counter) ------------------------------------------------ CHECKPOINT; DBCC DROPCLEANBUFFERS; GO SET NOCOUNT ON; GO IF OBJECT_ID('tempdb..#SOID') IS NOT NULL DROP TABLE #SOID; CREATE TABLE #SOID (ID INT IDENTITY, SalesOrderID INT NOT NULL); DECLARE @i INT = 1, @ii INT, @SalesOrderID INT, @LineCount INT; INSERT #SOID (SalesOrderID) SELECT SalesOrderID FROM Sales.SalesOrderHeader WHERE OnlineOrderFlag = 1; SET @ii = @@ROWCOUNT; WHILE @i <= @ii BEGIN SET @SalesOrderID = (SELECT SalesOrderID FROM #SOID WHERE ID = @i); SET @LineCount = ISNULL(@LineCount, 0) + ISNULL(( SELECT COUNT(*) FROM Sales.SalesOrderDetail WHERE SalesOrderID = @SalesOrderID ),0); SET @i += 1; END IF @LineCount <> 60398 BEGIN RAISERROR('Bad Total',16,1); PRINT @LineCount; END GO
Even for a simple while-loop there are multiple ways to write the logic – I’ve chosen a simple counter here, but didn’t see massive variances in the performance of the 2 or 3 methods I tested.
Here’s the code I used to test cursors:
------------------------------------------------ -- CURSOR ------------------------------------------------ --DECLARE cursor_name CURSOR [ LOCAL | GLOBAL ] -- [ FORWARD_ONLY | SCROLL ] -- [ STATIC | KEYSET | DYNAMIC | FAST_FORWARD ] -- [ READ_ONLY | SCROLL_LOCKS | OPTIMISTIC ] -- [ TYPE_WARNING ] CHECKPOINT; DBCC DROPCLEANBUFFERS; GO SET NOCOUNT ON; GO DECLARE @SalesOrderID INT, @LineCount INT; DECLARE SOID CURSOR TYPE_WARNING FOR SELECT SalesOrderID FROM Sales.SalesOrderHeader WHERE OnlineOrderFlag = 1; OPEN SOID; FETCH NEXT FROM SOID INTO @SalesOrderID; WHILE @@FETCH_STATUS = 0 BEGIN SET @LineCount = ISNULL(@LineCount,0) + ISNULL(( SELECT COUNT(*) FROM Sales.SalesOrderDetail WHERE SalesOrderID = @SalesOrderID ), 0); FETCH NEXT FROM SOID INTO @SalesOrderID; END CLOSE SOID; DEALLOCATE SOID; IF @LineCount <> 60398 BEGIN RAISERROR('Bad Total',16,1); PRINT @LineCount; END GO
This represents the simplest test (i.e. just a plain cursor declaration with no options other than TYPE_WARNING). I used this as a baseline, to which I added various combinations of cursor options, such as STATIC vs DYNAMIC, LOCAL vs GLOBAL, etc.
To measure performance I profiled the execution of the main batch (i.e. not including the checkpoint, clearing the caches, etc) and recorded CPU, Duration, Reads, & Writes over 5 executions per test run (I actually ran each more than just 5 times, but results were so consistent that I only recorded the last 5 executions of each test).
Show me the money
The surprising thing for me (in this test case) was how poorly the while-loop performed. The graph to the right shows the various cursor-based test runs, all of which completed in less than 1 second (click the image to viewer a larger copy).
The while-loop consistently took 30 seconds to run.
By the way, please feel free to point out any mistakes I’ve made.
The reads show a similar story regarding the stark difference between the while-loop and any of the cursor tests. The cursor read I/O figures are in the graph to the right, and the while-loop averaged well over 1.8 million reads (it dwarfed the others on the graph so I removed it).
I guess what I find interesting is that many people have run tests, and look at logical or physical I/O, but don’t take into account CPU or overall duration. If my data is sitting in memory anyway, I don’t really care about those logical reads – but the difference between a query taking 100 vs 1000 ms of my CPU’s time is very important to me. Am I neglecting to take anything else (important) into account?
I don’t have one…
These tests are too fickle to make any concrete observations – you could change just a few lines of code and get completely different results to what I’m seeing.
Personally, I will continue to use both where appropriate. Before anyone has a heart attack let me clarify that I would always first use a set-based solution before resorting to either.
But… my rule of thumb is that I use CURSORS if I’m performing data manipulation, and a WHILE loop if I’m performing an action until some condition is met. For cursors, I tend to use the “LOCAL STATIC FORWARD_ONLY READ_ONLY” options, just because they roll off the tongue so nicely. If you’re on the SQL Server team at Microsoft and you’re reading this – in the next version of SQL could you please consolidate these options down to a single “GO_BABY_GO” option instead?
As a final note, highlighting why a set-based alternative is always better; running this query “properly” (i.e. set-based, not RBR) gave me just over 900 reads, and completed in 29 milliseconds. No contest really.