Category: SQL Server

  • PASS Summit trip wrap-up

    I left for the US just over 2 weeks’ ago to attend the PASS Summit in Seattle, followed by a week in San Francisco doing the tourist thing.

    Basically, the Summit was awesome. The pre-con sessions were amazing (and intense!) – I think my brain was physically stretched during Itzik Ben-Gan’s session on advanced T/SQL.

    I attended quite a few level-400 & 500 sessions on SQL Server internals, memory management, etc, as well as some “peripheral” topics like Event Notifications and Bob Ward’s excellent “Inside TempDB” session.

    I got along to the networking dinner at Lowell’s on Monday night, where I met & had dinner (and a few drinks) with Sarah Barela (blog|twitter) and Andrew Calvett (blog|twitter) which was a nice start to the week. I also managed to see a little of Seattle on the weekends before and after the Summit, and got some nice photos along the way.

    I’ve definitely arrived back at work feeling re-energised, and with lots of new ideas about different ways of monitoring our systems, troubleshooting existing problem areas, and how to better architect new features in future.

    After PASS, I flew down to San Francisco where I had decided to stay for a week (it’s a long way to travel from New Zealand to the US just for 1 week).  I hired a Harley Fat Boy for 4 days and got out around Northern California a bit which was awesome.  No planned routes, just a rough idea, and I rode each day until I was tired and then looked for a motel to stay the night.

    I ended up travelling from San Francisco up the coast to Monte Rio, then up through Fort Bragg and around to Clear Lakes, then across to Lake Tahoe, down to Placerville, and back to San Francisco by Thursday afternoon.

    An amazing experience, and one I’ll definitely try to do again (maybe with my wife next time).

     

    The one very sad thing that happened while I was away though was that South African were knocked out of the Rugby World Cup. Sad smile But at least New Zealand ended up winning it, so I can kinda half claim that as a home team victory I guess.  Winking smile

     

    Cheers,
    Dave

  • How’s the weather in Seattle?

    PASS Summit

    I’m fortunate enough to be heading to Seattle this weekend for the PASS Summit courtesy of my employer (thanks again!). Smile It’s my first visit the the US as well, so I’m extra-excited!

    The weather here (in Wellington, New Zealand) is pretty average at the moment – I still haven’t completely dried out after this evening’s motorcycle ride home.  So I really hope the weather in Seattle is going to be better next week.

    As for the Summit itself, I think I’ve prepared as well as possible;

    I’m going to try to wrap up any lose ends at work tomorrow, and will then have a think about what I specifically want to get out of the Summit, and any technical issues I can raise with the many smart people I’m likely to bump into over the course of next week.

    After the Summit

    Harley Davidson FatboyHow’s the weather in San Francisco? ‘Cos that’s where I’m heading for the week after the conference.  I fly into San Fran next Saturday evening, will spend Sunday poking around the city a little, and then on Monday morning I’m off for 4 days on a rented Harley Fat Boy. Very much looking forward to that!  I’m planning to ride a circular route east of San Francisco, maybe as far as Carson City, and down as far as Fresno on the return leg.

    I’ve spoken to someone who did a very similar ride, so have some idea of distances and how far I can expect to ride each day – but I won’t really know until I’m there, in the saddle so to speak.  I’ve even been growing out the beard a bit to complete the leather-clad ensemble.

    Winking smile Awesome…

    Bad timing

    The only bad bit of luck I’ve run into so far is that my return flight in 2 weeks’ time is at the same time as the final of the Rugby World Cup. *

    Bryan HabanaThis is a problem. In case you didn’t already know, I’m a South African, and South Africa is home to the Springboks (aka the best rugby team in the world).  I grew up on a diet of rugby, I played rugby at school, I played rugby after school… you get the picture.  In other words, I have high hopes that South Africa will win the World Cup (again).

    So please, when I get off the plane in Auckland, could the entire country act as if nothing’s happened until after I watch the recorded game? Thanks.

    * For those of you in countries unfortunate enough to not play the game, rugby is kind of like football, except the ball is oval, you carry it instead of dribbling it along the ground, and the players are real men (not drama-queen-cry-baby-pussies). And yes America, you guys do actually have a team playing in the World Cup.

    South Africa - 2007 World Cup Winners

    Cheers,
    DB Dave

  • SQL Server word-level bigram function

    I work for an online auction website, Trade Me, and as such collect quite a large quantity of search data (i.e. full text searches that people perform).  Over the years I’ve been involved in various aspects of our search; from tweaking the nuts and bolts of the full-text search itself (see my blog posts about boolean search operators for example) to working on our recommendation algorithm, and implementing search auto-suggest.

    Something I’m looking into at the moment is changing the way we recommend alternative search terms when we think you’ve misspelt something – i.e. our “did-you-mean” system.

    Did you mean…?

    The way we currently do it is based on taking the individual words entered, and generating a large number of alternative spellings for each of them, and then comparing those lists of words against a “known-good” list of search terms and returning the most likely one for each word entered (according to a magic score based on frequency, average results per search, etc).  This is done on the fly and is surprisingly quick (it’s all done within TSQL functions and procedures), but I want to experiment with word level n-grams (unigrams, bigrams, and trigrams specifically).

    Basically, this involves taking each search string and splitting it into individual words to give you your unigrams, or then pairing them up (in the case of bigrams).

    For example, the string “pink ipod nano” would be returned as:

    Word 1Word 2
    #pink
    pinkipod
    ipodnano
    nano#

    The hash symbols are delimiters and are sometimes not used, in which case the 1st and last rows of this table wouldn’t be returned. In my function below, I use a less common symbol (the “Eszett”) as my delimiter, and have a parameter to allow it to be omitted if necessary.

    What this allows you to do is relatively easily store a large set of known n-grams and their occurrence frequencies (i.e. how common they are), and then predict either what word the user is likely to type next (in the case of an auto-suggestion system), or what the user actually meant to type in the case of a misspelling.

    For example, if a user types “ipod” followed by a space into the search box, then given a large enough working dataset you should be able to predict what the next word is likely to be (even more so as they start typing the first few letters).  Or, if the same user types in “ipod nqno“, then we should be able to work out that firstly “nqno” isn’t a common word according to our list of unigrams, and secondly that the word “ipod” followed by a space and “n” is most likely meant to be “ipod nano” – and we can then recommend that as an alternative search.

    Obviously this is very high level, and there are many, many other factors and variables to take into account (which is what makes it interesting!) – but it gives you a little background to the original point of this blog post; sharing my bigram function. 🙂

    The guts

    It’s relatively self explanatory, and would likely be rewritten as a CLR function before going into production, but having a TSQL function allows me to test my theory myself before asking a developer to spend any time on it.

    -- =============================================
    -- Author:		David Curlewis
    -- Create date: 2011-09-08
    -- Description:	Splits a string and returns word-level bigrams (i.e. all pairs of words making up the string)
    -- =============================================
    CREATE FUNCTION dbo.fn_word_bigrams (
        @string             NVARCHAR(1000),
        @include_delimiters BIT = 1 -- whether to include rows for prefix and suffix delimiters
    )
    RETURNS @bigrams TABLE (
        row_id  INT, 
        word1   NVARCHAR(100),
        word2   NVARCHAR(100) 
    )
    AS
    BEGIN
        DECLARE @rowcount   INT,
                @i          INT;
    
        DECLARE @tt_words   TABLE (row_id INT, word NVARCHAR(100))
        DECLARE @tt_bigrams TABLE (word1 NVARCHAR(100), word2 NVARCHAR(100))
    
        SET @i = 1;
        SET @string = ltrim(rtrim(@string)) 
            + CASE @include_delimiters WHEN 1 THEN ' ß' ELSE '' END;
    
        WITH split_words AS
        (
            SELECT  ROW_NUMBER() OVER(ORDER BY @@SPID) AS row_id,
                    item AS word
            FROM    dbo.SplitString_Multi(@string, ' ')
        )
        , bigrams AS 
        (
            SELECT  row_id,
                    CAST('ß' AS NVARCHAR(4000)) AS word1, -- ß is our delimiter
                    CAST(word AS NVARCHAR(4000)) AS word2
            FROM    split_words
            WHERE   row_id = 1
            UNION ALL
            SELECT  b.row_id + 1 AS row_id,
                    CAST(b.word2 AS NVARCHAR(4000)) AS word1, 
                    CAST(w.word AS NVARCHAR(4000)) AS word2
            FROM    split_words AS w
            JOIN    bigrams AS b ON w.row_id = b.row_id + 1
        )
        INSERT  @bigrams
        SELECT  row_id,
                word1,
                word2
        FROM    bigrams
        WHERE   row_id > CASE @include_delimiters WHEN 1 THEN 0 ELSE 1 END;
        
        RETURN; 
    END

    Shout if you have any questions or comments.

    Cheers,

    DB Dave

  • Tech Ed done, bring on PASS Summit!

    I’m feeling somewhat human again after having caught up on some sleep back at home after 4 days in Auckland at Tech Ed New Zealand.

    Thanks again to those who came to our session (DBI207 – Trade Me: 0-70 million pages in 0.34 Gigaseconds) – and specially to those who took the time to leave feedback.  Overall we got a good response, with the only gripe being that it wasn’t technical enough.  Probably a fair enough observation.

    It was classified as a level 200, but that was set before we actually knew exactly what we were going to present on – so in hind sight level 100 might’ve been more accurate. I also spent a little longer on some of the preamble than I wanted to, which meant the more technical stuff had to be rushed a little. This is something that will improve with speaking experience I think.  So, if you didn’t get what you wanted from it, I apologise, but I’m keen to do a technical session at a later date so keep an eye out for that at a conference or user group near you. 🙂

    Simon, another work colleague, also presented a session (WUX207 – Releasing Trade Me’s Latest Project) which was awesome. My excuse is that he’s spent a fair amount of time on-stage in a past life doing improv. :-p  Needless to say there was some healthy competition between Matt & I, and Simon as far as feedback scores were concerned.  He won…

    The rest of the conference seemed to go pretty well – congrats to those involved in organising it.  I didn’t get to see as many sessions as I wanted to, but enjoyed the few I did attend, and heard mostly positive feedback from the other guys from work who attended.

    Here are all the recorded sessions from this year, including our session, and Simon’s.

    So, the next thing on my calendar is PASS Summit in Seattle in October. Super excited! Never been to the US before, and can’t wait to attend pre-cons and sessions presented by folks who I’m sure will explode my brain. I’ve signed up as a “first timer” (my “big brother” is Craig Purnell), and I’ll probably also be up for the Monday night dinner at Lowell’s, so might see you there!

    Cheers,
    DB Dave

  • Getting ready for Tech Ed New Zealand

    Sky Tower, Auckland
    Sky Tower, Auckland

    Tech·Ed NZ kicks off the day after tomorrow at the Auckland SkyCity convention centre.  I’ve been to a few in the past, so nothing new there – but what makes this year different is that myself and a colleague will be presenting a session:

    DBI207 – Trade Me: 0-70 million pages in 0.34 Gigaseconds

    Its not a terribly technical session, but rather a session about our (i.e. my employer’s) infrastructure & architecture.  Given the nature of our architecture it is a fairly database-centric session, although we’ll cover the network and application layers briefly as well. It’s at 13:55 on Wednesday if I’ve whet your appetite.
    Oh, and if you don’t know what a “gigasecond” is, then Wikipedia is your friend.  It sounded better than “0-70 million pages in roughly 11 years“. 😉

    It’s a tweaked and updated version of the session I presented at the Wellington SQL Server User Group (along with my colleague, Moana), then at the Christchurch .Net User Group, and most recently the Auckland SQL Server User Group.

    Simon, our head of dev is also presenting a session this year. His will be about our development and deploy processes, culminating in the live launch of a new Trade Me “thing” (I can’t really describe it any more accurately without hinting at what it is, and it’s super top secret squirrel and I’ll be shot if I let it slip early). His session is at 13:55 on Thursday, so it’ll be worth a look-see.

    So yeah, hopefully it all goes well! If you’re coming along be sure to say hi before or after the session. There will be around 10 of us from Trade Me attending this year, so the chances of bumping into one of us is fairly good. 🙂

    Cheers
    DB Dave

  • T-SQL Tuesday #21 – Temporarily permanent hacks

    This month’s T-SQL Tuesday Wednesday is hosted by Adam Machanic (blog|twitter) – who in fact started the T-SQL Tuesday concept himself back in 2009 (read more about the genesis here).

    T-SQL Tuesday #21This month’s it’s about “crap code”.
    We’ve all seen it, and we’ve all written our fair share of it.  I’m not going to concentrate too much on actual crappy code – like using deprecated syntax or functions, lots of GOTO logic within cursors (always bags of fun), etc. – but will rather look at some of the other steaming piles of “legacy” we leave behind us in the course of our busy DBA lives. None of these is necessarily going to take your production system down, or cause world war 3, but they’re bloody annoying at least.

    Some of my favourite examples of such landmines include;

    1. “Temporary” logging tables
    2. Renamed or “backup” tables
    3. Hard-coded values in stored procedures
    4. Scheduled job sprawl (or bloat)
    “I’ll just add some logging while I troubleshoot this bug.”

    Take one stored procedure, add a handful of intermittent business rule bugs, and you’ve got the perfect excuse to add a new logging table, and some logging code to the procedure. Something similar to this perhaps:

    create table dc_logging_tmp (dt datetime, m_id int, bal money);
    
    IF @member_id IN (6763, 3884, 8734) AND @ledger_balance < 0
    INSERT dc_logging_tmp (m_id, bal) VALUES(@member_id, @ledger_balance);

    Of course the table name is generic, there are no indexes, and the logging code has no comments explaining its existence – but you’ll only need it for a few hours, maybe a day or 2 at most, and then you’ll rip it all out again. Right?

    If you really only need something short-term, and there’s a chance you won’t remember to tidy up after yourself (and it won’t break anything if the table doesn’t exist), then consider creating your logging objects in TempDB.  Next time you restart your SQL service: Poof!! All gone.

    “Just to be safe I’ll create a copy of this table before we deploy this application update.”

    You’re about to do a deploy to production which involves making some data changes. So, being the sensible, risk-averse DBA that you are, you decide to take a snapshot of the current table and do a simple “SELECT * INTO” before the update.

    Of course you’ll remember to drop this temporary backup table tomorrow once you’re sure you won’t need to roll back the update, so calling it [zzz_table_name] seems reasonable. Right?  Aaah, no. There’s nothing stopping you from creating table with names like [ledger_pre_Jul2011_v3_deploy_drop_after_3_weeks].
    Okay, that might be extreme, but you get the idea.  If you feel like being especially thorough you could even make use of extended properties to add comments, descriptions, or whatever else you like.

    CREATE TABLE TableWithDescription (col INT)
    GO
    
    EXEC sp_addextendedproperty
    @name = N'Description',
    @value = 'This table is for XYZ, blah blah blah, etc...',
    @level0type = N'Schema', @level0name = 'dbo',
    @level1type = N'Table', @level1name = 'TableWithDescription';
    GO
    
    SELECT name, value
    FROM fn_listextendedproperty (
    'Description', 'Schema', 'dbo', 'Table',
    'TableWithDescription', NULL, NULL
    );
    GO

    Here’s a little more information on extended properties from Glenn Berry (twitter|blog).

    “It’ll just be easier to hard-code these hourly rate values in this procedure.”

    The thing about writing maintainable code is that it should be easy to maintain (the clue is in the name, you see). You shouldn’t need to alter a dozen stored procedures if your company’s rates change – things like that should be stored in a lookup table, or (an option I quite like) in table valued functions or views. You should obviously still test and choose what’s best suited to your particular situation.

    The main advantage of using a TVF or view, in my opinion, is that you can make use of source control and versioning (which is more difficult to do with data stored in a table). This article by Joe Celko gave me a few new ideas in this regard.

    Unfortunately I still find plenty of examples of developers including static values in stored procedures. Remember kids; every time you get lazy and hard-code something that you shouldn’t, a kitten loses it’s wings.

    “I need this stored proc to run every day. Could you just quickly create a scheduled job for me?”

    How many scheduled jobs do you have? Do you know exactly what every single one of them is doing? If you do, then I’m jealous.

    Environments that have been around for a while tend to suffer from scheduled job build-up.  This can rear its head in one of two ways; either you end up with a million separate jobs (i.e. job sprawl), or a few jobs, each with a hundred steps (i.e. job bloat).

    My advice in combatting this is to decide how you’re going to group your jobs (e.g. by schedule, function, target database, etc), and then stick to it.  Also remember to religiously maintain your job names and descriptions, schedule names, and step names – this makes your jobs essentially self documenting (you could even write a few simple queries to actually generate some documentation for you!).

    Well, that’s about enough from me on the topic of crap. I’d love to hear your feedback on any of the examples I’ve covered here, and feel free to tell me about your own crap (or the crap you’ve had to deal with).

    Cheers
    DB Dave