Category: Other Tech

Combining different database technologies

Bike and a Bunny

Well the past month or two have been “busy”; I bought an old motorbike and restored it (win!), I sold my main motorbike and bought a new one (bigger win!), I then crashed it (not badly, but enough to ruin my day – boo!), I got the repaired bike back 4 weeks later (yay!) – and that’s just the 2-wheeled events, not the several dental surgeries my 7yo daughter’s needed, or the old “should-we-buy-a-new-house” gem. Phew…

Of course alongside the above events, work has been as busy as ever. I’m trying to figure out if it’s actually possible for it to be getting busier, when a year ago I thought it was already crazy-busy. Anyway, we’ve got a number of slightly more meaty projects on the go involving major architectural changes, upgrades, migrations, etc – which are stressful but more rewarding than most normal BAU work.

Anyway – getting to the point of this post, one of the things I’ve been thinking about on the side (to some extent) has involved some NoSQL technologies. Well, one so far, in the form of Redis. Nothing concrete so far, just a VM on my laptop that I’ve been playing with.

This got me thinking (a dangerous event at the best of times); is there a method of efficiently transferring data between disparate database systems, such as SQL Server and Redis, other than writing higher level applications to handle the ETL process. I mean, that obviously works (and maybe I just don’t understand enough about how people currently do this) but it seems to me like there should be a more efficient, “lower level” method of pumping a stream of data out of SQL and into Redis (for example). Could a CLR procedure fill the gap here perhaps?

The reason I was playing with this idea in the first place was related to a few bits of functionality which are sort of related; search autosuggest, a recommendation engine, and a context sensitive spell checker. All of these require massive sets of fairly narrow data (i.e. key-value pairs) that need to be queried in a predictable manner, very very quickly. All of these are actually already implemented using SQL Server, and handle some not-to-be-sneezed-at loads as it is.

So SQL Server obviously works to an extent, and performance can be increased significantly by using solid state storage (SSDs or cards like the FusionIO range), or even further by using system memory (via some kind of ram drive software). The data itself isn’t mission critical, so redundancy isn’t as crucial, meaning volatile storage can be used.
But at some point it becomes a lot more difficult to scale SQL to handle increasing load (financially and practically), which is where I think solutions like Redis, MemCacheDB, and quite a few others fit the bill perfectly.

This is just one example where a key-value store makes sense (to me at least), although there are of course other scenarios where you might need a document store instead (like MongoDB) or the Map/Reduce batch-processing power of something like HBase. In fact, Microsoft already has a Hadoop (or is that “an Hadoop…“?) connector, so I wonder if we will eventually have connectors for other systems as well?

I guess for now I’m resigned to brushing up on my horribly dusty and inadequate .Net dev skills in order to try out my whacky grand plans of disparate databases talking to each other nicely like good little children. Any dev’s feel like giving me a hand? :-p

One random, rambling post; check. 😉

Cheers
Dave

2012-08-14
Benchmarking I/O: The test rig
Work has been a little crazy lately, but I’ve still been wanting to do some I/O performance testing since we’re due to get some new servers in soon and I wanted to be able to compare apples with oranges. So this past weekend I decided to combine something I’ve been meaning to learn more about (PowerShell) and some SQL scripts based on those described by Paul Randal (blog | twitter) in his post about testing Fusion-IO ioDrive Duo SSD’s.

Just to be clear before going any further, I’ve tested these scripts in small scale (i.e. on my laptop) and they worked well, but I’m happy to have any bugs and/or crappy code pointed out to me in the comments. 🙂 Thanks.

So, because I tend to over-engineer things, I laid out my requirements early on to prevent any self-induced scope-creep. These were:
- It must accept parameters for anything the user would need to change
- It must create the test database and any other required objects
- It should automatically re-run the test the specified number of times (to allow averaging of results)
- It should automatically record test results without requiring me to intervene between test runs
- If possible, it should be able to automatically change test configurations (e.g. number of database data files) and re-run the tests without intervention
The jury’s still out on how many of these goals I achieved, but let’s dive in anyway. I’m not going to post snippets of all the code – otherwise it’ll just get too bulky. Instead I’ll highlight the more important bits, and then upload the whole shebang at the end for you to download.

The PowerShell script basically controls the flow of the test process, but there are 5 SQL scripts which do the actual heavy lifting. The first 2 files create the database (with the specified number of data files, and of the specified size), as well as a few tables used in the test:
```
-------------------------------------------------------------------------- 
-- Create test database 
--------------------------------------------------------------------------
SET @cmd = N'
	IF DB_ID(''' + @db_name + ''') IS NOT NULL DROP DATABASE [' + @db_name + ']; 
	CREATE DATABASE [' + @db_name + '] ON PRIMARY (
		NAME = N''' + @db_name + ''', 
		SIZE = ' + CAST(@data_file_size_mb AS NVARCHAR) + 'MB , 
		FILEGROWTH = 250MB, 
		FILENAME = N''' + @db_data_path + @db_name + '.mdf'')'; 

-- Add additional files 
WHILE @i < $(NoOfDataFiles) 
BEGIN 
	SET @cmd = @cmd + CHAR(10) + ',(NAME = N''' + @db_name + CAST(@i AS NVARCHAR) 
		+ ''', SIZE = ' + CAST(@data_file_size_mb AS NVARCHAR) + 'MB , FILEGROWTH = 250MB, FILENAME = N''' 
		+ @db_data_path + @db_name + CAST(@i AS NVARCHAR) + '.ndf'')'; 
		
	SET @i = @i + 1; 
END 

SET @cmd = @cmd + CHAR(10) + ' LOG ON (NAME = N''' + @db_name + '_log'', SIZE = ' 
	+ CAST(CASE WHEN $(TargetDbSizeGB) <= 1 THEN 128 WHEN $(TargetDbSizeGB) > 1 AND $(TargetDbSizeGB) <= 5 
		THEN 512 WHEN $(TargetDbSizeGB) > 5 AND $(TargetDbSizeGB) <= 10 THEN 2048 ELSE 4096 END AS NVARCHAR) 
		+ 'MB, FILEGROWTH = 100MB, FILENAME = N''' + @db_log_path + @db_name + '_log.ldf''); ALTER DATABASE [' 
		+ @db_name + '] SET RECOVERY SIMPLE;'; 

EXEC sp_executesql @cmd, N'';
```
The main work table is a copy & paste from Paul’s post mentioned earlier, and has a clustered index on a GUID column which forces random I/O. I guess you could change this to an integer/identity column to change this to sequential I/O instead.

The 3rd SQL script performs the inserts, and is copied from a related post from Thomas Kejser (blog) – I just added a bit that works out how many rows to insert based on the target database size specified.
```
SELECT @counter = 0, @start = GETDATE(); 
WHILE (@counter < @no_of_rows_to_insert_per) 
BEGIN 
	IF @counter % 1000 = 0 
	BEGIN TRAN; 
		INSERT INTO MyBigTable DEFAULT VALUES; 
		SET @counter = @counter + 1; 
		
		IF @counter % 1000 = 999 
			COMMIT TRAN; 
END; 

SET @end = GETDATE(); 
INSERT INTO Results (duration) VALUES (CAST(DATEDIFF(SS, @start, @end) AS INT));
```
The 4th script is directly from this post of Paul’s on wait stats – its definitely one worth adding to your toolbox.

Finally, the 5th script aggregates each test run’s results into one row, so that the test rig can automatically blow everything away and run again without losing any results.

One of the biggest challenges I faced (as the PowerShell n00b that I am) was figuring out how to asynchronously launch multiple processes and wait for them to finish. In “batch file land” I would’ve just thrown together some “FOR %%i IN… DO START sqlcmd….” magic, but I wouldn’t allow myself to use anything but PowerShell for the flow control. After some time spent searching around I found Start-Job which allowed me to launch a given number of instances of SQLCMD and then wait for them to all finish by calling Wait-Job (obviously!).
```
###################################################################### 
# Kick off the specified number of insert processes ($NoOfInsertProcesses) 
###################################################################### 
$sj = { param($SQLInstanceName, $TestDBName, $InsertRowsPath, $TargetDbSizeGB, $ApproxRowByteSize, $NoOfInsertProcesses) sqlcmd -S "$SQLInstanceName" -d "$TestDBName" -i "$InsertRowsPath" -v TargetDbSizeGB=$TargetDbSizeGB ApproxRowByteSize=$ApproxRowByteSize NoOfInsertProcesses=$NoOfInsertProcesses } $(for ($x=1; $x -le $NoOfInsertProcesses; $x++){ Start-Job $sj -ArgumentList $SQLInstanceName, $TestDBName, $InsertRowsPath, $TargetDbSizeGB, $ApproxRowByteSize, $NoOfInsertProcesses } ) | Wait-Job #Wait for all insert processes to finish before continuing
```
Funnily enough, one of my main “Aha!” moments was had while reading this post by Aaron Bertrand (blog | twitter) who isn’t your typical hard-core PowerShell or Windows Server blogger, but instead a SQL Server MVP – so either it was a sign I was on the right path, or Google knows far too much about my reading habits.

Okay – please feel free to download the scripts, tear them apart, point and laugh, etc (although, if you point and laugh, please at least tell me why so I can learn from it). As I said before, I haven’t run them on anything apart from my laptop (which I think would’ve taken several years to perform the full-blown test), so I’d be interested to hear of your experiences.

IOTestScripts

In the coming weeks I’m hoping to benchmark several of our existing servers (sitting on top of various storage platforms, such as SAN – both solid state & fibre channel, local disk, NAS, and RAM disks).

Cheers,

DB Dave
2011-06-20
Do you want a job? Then don’t do this…

We’re always looking for good “database folk” (i.e. DBA’s or database developers). Its fair to say that the pickings have been pretty slim for some time now, but we still get a constant trickle of applications that make me want to laugh (and sometimes cry).

Here are a few “pain-points” I’ve come across recently in our recruiting efforts:

Dnt us txt-spk in yr application

While writing in “txt-speak” may convey to your potential future employer that you’re young and in touch with modern pop-culture, it doesn’t instil a great deal of confidence in your intellectual abilities. I like to know that you can actually spell “thanks” (hint, it doesn’t have an “x” in it).

Don’t just use a crappy cover-letter template

I’ve had a few cases of deja vu where several candidates have had remarkably similar letters, if not exactly the same. I’ve even had one applicant leave in the template place-holders… I’m sorry, we don’t currently have an opening for a [job-title] here at [company-name].

Make sure you send the right stuff

I’m all for you customising your cover-letters, and even CV’s, to reflect the position you’re applying for, but make 110% sure that you send me the right ones. If you’re applying for a “Junior DBA” role at “Company A”, make sure your cover-letter doesn’t thank me for considering your application for the position of “PHP Developer” at “Company B” (which, by the way, is way cooler than Company A).

Read the advertisement

Seems simple enough, right? Just double check that the company isn’t asking for something in addition to the usual cover-letter & CV combo. We often ask candidates to answer several questions in their cover-letters. These questions usually relate to the position or the company, and help to show us how you think, how much effort you’re willing to put into your application, etc. Needless to say, you may get a few “negative points” if you overlook the questions entirely.

Are there any I’ve missed? Anything you don’t agree with? Leave me a comment with your thoughts.

Cheers
Dave

2011-05-17
Public speaking jitters
The week before Easter, a colleague and I presented at the Wellington SQL Server User Group. It was a first for each of us, and its fair to say neither of us was jumping with joy at the idea of it. To ease into it we did a few things which I think helped make it go pretty well, which I’ll run through below:
1. Most obviously, we did a joint presentation (strength in numbers, or something) 🙂
2. Our topic wasn’t an in-depth technical one, but rather a “day in the life” type of talk, which made it easier because essentially we were talking about the stuff we do every day, with a few interesting tidbits thrown in to keep people from nodding off.
3. We prepared reasonably well before-hand. But we didn’t have any dry runs before the real deal because I took the week leading up to the presentation off to go on holiday! Masterful timing I must say.
4. I mentioned it before, but it’s worth mentioning again; its important to have some lighter/interesting stuff thrown in to keep the content at least slightly more entertaining than watching paint dry.
So yeah, it went well. We had some good feedback, as well as some constructive criticism (made even more constructive by the fact that it was delivered over several beers), so I’m keen to keep the ball rolling and will try to organise some more speaking opportunities soon.

If you’re thinking about making the jump yourself, I can highly recommend it. In the immortal words of Nike’s advertising agency; “Just do it!”

Cheers
DB Dave
2011-05-05