This is probably one of those things you’ll read and think “What a dumbass, I’d never make that mistake!” – which is fine, but I thought I’d post about it nevertheless since it managed to trip me up twice in the space of a week.
I’ve got a script which grabs data from several large tables (10’s of millions of rows each), and aggregates and sanitises the data in 5 or 6 distinct steps. I used to do this using several temp tables in the “bad old days”, but now use easier to read (to me anyway) Common Table Expressions – or CTE’s.
This post isn’t going to tell you how to use CTE’s – there’s a pile of great stuff out there waiting for you to dig it up using your search engine of choice.
This post will however demonstrate how 3 CTE’s managed to first trick me into thinking I was a performance tuning god, and then confound me with an “impossible to fix” bug.
Consider the following pseudo code:
;with cte_1 as
group by blah
having AVG(value) >= 10
group by some_more_grouping
I wrote something similar to this to grab some data from a large table, whittle it down with some WHERE clauses in the 1st CTE, then run some gnarly functions and more WHERE clauses in the 2nd CTE, and finally some more aggregations in a 3rd CTE before finally inserting the results into a staging table. So each CTE should reference the CTE above it.
As you can see above though, CTE_3 is referencing CTE_1 instead of CTE_2. This basically means that SQL ignores CTE_2 (since it’s output isn’t used anywhere). This explains why I thought I was a performance tuning god – I made a few changes to the query, and the run-time went from 6 minutes to 12 seconds! I also soon found that the data didn’t look right, and it took 45 minutes of pointless poking around before I had that forehead-smacking realisation that a typo was the cause of my pain (and my runtime was back up to 6 minutes, awww man!).
Yes, this could be (and was) done in a single statement, but I broke it up into separate CTE’s for readability.