CURSORs and WHILE-Loops

can of wormsYes, I know this is a giant (some might say pointless, and I wouldn’t totally disagree) can of worms I’m opening here, but stay with me for a minute.

Why?

We were discussing this at work a few months ago, and I was one of those people who followed “conventional wisdom” which says that SQL cursors are evil, and should be replaced with WHILE-loops if a set-based alternative isn’t possible (or practical).

So I thought I’d do some testing for myself.  I know this has been done more than once before (I won’t mention any articles specifically, but they’re easy to find using your search engine of choice) – but I like to figure things out for myself rather than read a blog and take the author’s word as infallible.  Some of the other tests I found online also only concentrated on one specific metric when measuring performance (e.g. physical I/O), or were too obviously biased for my taste.

Living in the Real World

There are also what I refer to as “non-real-world” arguments against one or the other, for example:

  • Cursors are best because they’re easier/quicker to write (usually less variables and/or temp tables required).
  • While-loops are best because they’re inherently faster, since they don’t have to make use any of the internal mechanisms that cursors use to allow backwards and forwards data-set traversal.

The reason I call these “non-real-world” problems, is that I don’t think any DBA or developer worth their salt is going to worry too much about how many more characters one takes to type than the other, and I’ve very seldom (if ever) needed to use anything but a simple forward-only cursor – so these problems aren’t representative of the majority of use-cases in my experience.

Obviously if you take something like this super-simplified while-loop below, it would be difficult to write a cursor that would perform faster – but why the hell would you?

What?

Alright, down to business. Using the AdventureWorks database on my laptop (running 2008R2 build 10.50.2500.0), I wrote a simple test script which iterates through each SalesOrderHeader row (for online orders) and counts the number of SalesOrderDetail rows for each SalesOrderID.  I know… for someone who was just talking about “non-real-world” examples, this is pretty unrealistic – but the point is that it generates a measureable amount of I/O and CPU load, and I can reproduce the logic easily using both a while-loop and a cursor.

Here’s the code for the while-loop test:

Even for a simple while-loop there are multiple ways to write the logic – I’ve chosen a simple counter here, but didn’t see massive variances in the performance of the 2 or 3 methods I tested.

Here’s the code I used to test cursors:


This represents the simplest test (i.e. just a plain cursor declaration with no options other than TYPE_WARNING).  I used this as a baseline, to which I added various combinations of cursor options, such as STATIC vs DYNAMIC, LOCAL vs GLOBAL, etc.

To measure performance I profiled the execution of the main batch (i.e. not including the checkpoint, clearing the caches, etc) and recorded CPU, Duration, Reads, & Writes over 5 executions per test run (I actually ran each more than just 5 times, but results were so consistent that I only recorded the last 5 executions of each test).

Show me the money

Cursor DurationThe surprising thing for me (in this test case) was how poorly the while-loop performed.  The graph to the right shows the various cursor-based test runs, all of which completed in less than 1 second (click the image to viewer a larger copy).

The while-loop consistently took 30 seconds to run.

By the way, please feel free to point out any mistakes I’ve made.

Average ReadsThe reads show a similar story regarding the stark difference between the while-loop and any of the cursor tests.  The cursor read I/O figures are in the graph to the right, and the while-loop averaged well over 1.8 million reads (it dwarfed the others on the graph so I removed it).

I guess what I find interesting is that many people have run tests, and look at logical or physical I/O, but don’t take into account CPU or overall duration.  If my data is sitting in memory anyway, I don’t really care about those logical reads – but the difference between a query taking 100 vs 1000 ms of my CPU’s time is very important to me.  Am I neglecting to take anything else (important) into account?

Conclusion

I don’t have one…

These tests are too fickle to make any concrete observations – you could change just a few lines of code and get completely different results to what I’m seeing.

Personally, I will continue to use both where appropriate.  Before anyone has a heart attack let me clarify that I would always first use a set-based solution before resorting to either.

But… my rule of thumb is that I use CURSORS if I’m performing data manipulation, and a WHILE loop if I’m performing an action until some condition is met.  For cursors, I tend to use the “LOCAL STATIC FORWARD_ONLY READ_ONLY” options, just because they roll off the tongue so nicely. If you’re on the SQL Server team at Microsoft and you’re reading this – in the next version of SQL could you please consolidate these options down to a single “GO_BABY_GO” option instead? Winking smile

As a final note, highlighting why a set-based alternative is always better; running this query “properly” (i.e. set-based, not RBR) gave me just over 900 reads, and completed in 29 milliseconds. No contest really.

Cheers,

Dave

Leave a Reply