Well the past month or two have been “busy”; I bought an old motorbike and restored it (win!), I sold my main motorbike and bought a new one (bigger win!), I then crashed it (not badly, but enough to ruin my day – boo!), I got the repaired bike back 4 weeks later (yay!) – and that’s just the 2-wheeled events, not the several dental surgeries my 7yo daughter’s needed, or the old “should-we-buy-a-new-house” gem. Phew…
Of course alongside the above events, work has been as busy as ever. I’m trying to figure out if it’s actually possible for it to be getting busier, when a year ago I thought it was already crazy-busy. Anyway, we’ve got a number of slightly more meaty projects on the go involving major architectural changes, upgrades, migrations, etc – which are stressful but more rewarding than most normal BAU work.
Anyway – getting to the point of this post, one of the things I’ve been thinking about on the side (to some extent) has involved some NoSQL technologies. Well, one so far, in the form of Redis. Nothing concrete so far, just a VM on my laptop that I’ve been playing with.
This got me thinking (a dangerous event at the best of times); is there a method of efficiently transferring data between disparate database systems, such as SQL Server and Redis, other than writing higher level applications to handle the ETL process. I mean, that obviously works (and maybe I just don’t understand enough about how people currently do this) but it seems to me like there should be a more efficient, “lower level” method of pumping a stream of data out of SQL and into Redis (for example). Could a CLR procedure fill the gap here perhaps?
The reason I was playing with this idea in the first place was related to a few bits of functionality which are sort of related; search autosuggest, a recommendation engine, and a context sensitive spell checker. All of these require massive sets of fairly narrow data (i.e. key-value pairs) that need to be queried in a predictable manner, very very quickly. All of these are actually already implemented using SQL Server, and handle some not-to-be-sneezed-at loads as it is.
So SQL Server obviously works to an extent, and performance can be increased significantly by using solid state storage (SSDs or cards like the FusionIO range), or even further by using system memory (via some kind of ram drive software). The data itself isn’t mission critical, so redundancy isn’t as crucial, meaning volatile storage can be used.
But at some point it becomes a lot more difficult to scale SQL to handle increasing load (financially and practically), which is where I think solutions like Redis, MemCacheDB, and quite a few others fit the bill perfectly.
This is just one example where a key-value store makes sense (to me at least), although there are of course other scenarios where you might need a document store instead (like MongoDB) or the Map/Reduce batch-processing power of something like HBase. In fact, Microsoft already has a Hadoop (or is that “an Hadoop…“?) connector, so I wonder if we will eventually have connectors for other systems as well?
I guess for now I’m resigned to brushing up on my horribly dusty and inadequate .Net dev skills in order to try out my whacky grand plans of disparate databases talking to each other nicely like good little children. Any dev’s feel like giving me a hand? :-p
One random, rambling post; check. 😉