Author: DB Dave

  • Preventing identity overflow

    Here’s a stored procedure I put together a week or so ago that I thought others might find useful.

    Facepalm

    We had an issue which should really never be allowed to happen (but as they say, “shit happens”); a rather important table ran out of integer values for the identity column.  Luckily the actual value of the ID column on this table isn’t important because the table only stores the most recent month’s data, and everything before that is archived – so we could simply reseed the identity value back to 0 (which had long since been purged from this table).

    To prevent this in future I put together a pretty simple query to check all identity values, and check them against the maximum value for their given data types. This eventually became a procedure which is now just part of my “utility” database which I deploy to every production server.

    IF OBJECT_ID('dbo.usp_check_max_identity_value') IS NULL
    EXEC('CREATE PROC dbo.usp_check_max_identity_value AS ');
    GO
    ALTER PROC dbo.usp_check_max_identity_value (
    @alert_threshold TINYINT = 60, -- The percentage of the max value over which we will alert.
    @send_alert_as_email BIT = 0, -- Do you want to send any alerts as an email? (Well... do ya, punk?)
    @alert_email_recipients NVARCHAR(MAX) = NULL -- Semicolon-delimited list of email recipients.
    ) AS
    BEGIN
    SET NOCOUNT ON;
    SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
    
    DECLARE @srv_name VARCHAR(200),
    @db_name VARCHAR(200),
    @table_name VARCHAR(200),
    @column_name VARCHAR(200),
    @data_type VARCHAR(200),
    @max_value BIGINT,
    @max_tinyint TINYINT,
    @max_smallint SMALLINT,
    @max_int INT,
    @max_bigint BIGINT,
    @cmd NVARCHAR(4000),
    @percent_of_max INT,
    @alert_subject NVARCHAR(255),
    @alert_body NVARCHAR(MAX),
    @alert_counter INT;
    
    DECLARE @tt_alert_data TABLE (
    row_id INT IDENTITY,
    [db_name] VARCHAR(200),
    table_name VARCHAR(200),
    column_name VARCHAR(200),
    data_type VARCHAR(200),
    percent_of_max INT
    );
    
    -- These are the max values of each integer data type
    SELECT @max_tinyint = 255,
    @max_smallint = 32767,
    @max_int = 2147483647,
    @max_bigint = 9223372036854775807;
    
    -- Get a list of all available user databases to loop through
    DECLARE db_cursor CURSOR LOCAL STATIC FORWARD_ONLY READ_ONLY FOR
    SELECT DISTINCT
    QUOTENAME(name)
    FROM sys.databases
    WHERE database_id > 4
    AND is_read_only = 0
    AND state_desc = 'ONLINE'
    
    OPEN db_cursor;
    FETCH NEXT FROM db_cursor INTO @db_name;
    WHILE @@FETCH_STATUS = 0
    BEGIN
    
    -- Get a list of all user tables in the current database
    SET @cmd = N'
    DECLARE tables_cursor CURSOR GLOBAL STATIC FORWARD_ONLY READ_ONLY FOR
    SELECT DISTINCT
    QUOTENAME(SCHEMA_NAME(t.[schema_id])) + ''.'' + QUOTENAME(t.name) AS table_name,
    QUOTENAME(c.name) AS column_name,
    typ.name AS data_type
    FROM ' + @db_name + '.sys.tables t
    JOIN ' + @db_name + '.sys.columns c ON t.[object_id] = c.[object_id]
    JOIN ' + @db_name + '.sys.types typ ON c.system_type_id = typ.system_type_id
    WHERE t.[type] = ''U''
    AND t.is_ms_shipped = 0
    AND c.is_identity = 1
    AND typ.name IN (''tinyint'',''smallint'',''int'',''bigint'')
    ORDER BY QUOTENAME(SCHEMA_NAME(t.[schema_id])) + ''.'' + QUOTENAME(t.name);
    ';
    
    EXEC sp_executesql @cmd, N'';
    
    -- Loop through the list of tables
    OPEN tables_cursor;
    FETCH NEXT FROM tables_cursor INTO @table_name, @column_name, @data_type;
    WHILE @@FETCH_STATUS = 0
    BEGIN
    SELECT @max_value = NULL, @percent_of_max = NULL;
    SET @cmd = N'SELECT @max = MAX(' + @column_name + ') FROM ' + @db_name + '.' + @table_name + ' WITH (NOLOCK);';
    
    -- Run the SELECT to fetch the maximum value from the IDENTITY column
    EXEC sp_executesql @cmd,
    N'@max BIGINT OUTPUT',
    @max = @max_value OUTPUT;
    
    IF @max_value IS NULL GOTO FetchNext;
    
    -- Work out the percentage this represents of the maximum posible value
    SELECT @percent_of_max = CAST(CASE
    WHEN @data_type = 'tinyint' THEN ((CAST(@max_value AS FLOAT) / CAST(@max_tinyint AS FLOAT)) * 100.0)
    WHEN @data_type = 'smallint' THEN ((CAST(@max_value AS FLOAT) / CAST(@max_smallint AS FLOAT)) * 100.0)
    WHEN @data_type = 'int' THEN ((CAST(@max_value AS FLOAT) / CAST(@max_int AS FLOAT)) * 100.0)
    WHEN @data_type = 'bigint' THEN ((CAST(@max_value AS FLOAT) / CAST(@max_bigint AS FLOAT)) * 100.0)
    END AS INT)
    
    -- If the percentage is greater than our threshold, raise an alert
    IF (@percent_of_max > @alert_threshold)
    BEGIN
    -- Insert the values for use later (if we're going to send an email)
    IF (@send_alert_as_email = 1)
    BEGIN
    INSERT @tt_alert_data ([db_name], table_name, column_name, data_type, percent_of_max )
    VALUES (@db_name, @table_name, @column_name, @data_type, @percent_of_max);
    END
    
    RAISERROR(
    '%s - Identity column %s on table %s is currently at %i%% of an %s''s maximum value.',
    10,
    1,
    @db_name,
    @column_name,
    @table_name,
    @percent_of_max,
    @data_type
    ) WITH LOG;
    
    SET @alert_counter = ISNULL(@alert_counter,0) + 1;
    END
    
    FetchNext:
    FETCH NEXT FROM tables_cursor INTO @table_name, @column_name, @data_type;
    END
    
    CLOSE tables_cursor;
    DEALLOCATE tables_cursor;
    
    FETCH NEXT FROM db_cursor INTO @db_name;
    END
    
    CLOSE db_cursor;
    DEALLOCATE db_cursor;
    
    -- If we want to send an alert email (and there is one to send), then let's do that...
    IF (@send_alert_as_email = 1) AND (ISNULL(@alert_counter, 0) > 0)
    BEGIN
    -- If we haven't provided an email recipient, just carry on regardless
    IF @alert_email_recipients IS NULL RETURN;
    
    SET @srv_name = QUOTENAME(CAST(@@SERVERNAME AS VARCHAR(200)));
    SET @alert_subject = @srv_name + ' - Identity value overflow warning - ' + CAST(@alert_counter AS VARCHAR) + ' columns found.';
    
    -- Build the email body (HTML tags for the benefit of making the email pretty, but not necessary)
    set @alert_body = '<html><body><style type="text/css"> h3, p, table {font-family:verdana;}</style>';
    set @alert_body = @alert_body +
    N'<H3>Identity value overflow warning</H3>' +
    N'<p>This alert means that the following tables on the server ' + @srv_name
    + ' have integer-based IDENTITY column values which are currently greater than <strong>'
    + CAST(@alert_threshold AS VARCHAR) + '%</strong> of their respective data-types'' maximum possible values.</p>' +
    N'<p>This is just a warning, but you may want to consider increasing the size of the data-type you use (if possible) '
    + 'or reseeding the table (if the IDENTITY value itself is not important, other than it being unique).</p>' +
    N'<table border="1" cellspacing="0" cellpadding="3">' +
    N'<tr style="background-color:#c9cfff;"><th>Database</th><th>Table</th><th>Column</th><th>Type</th><th>%</th></tr>' +
    CAST ( ( SELECT td = [db_name], '',
    td = [table_name], '',
    td = [column_name], '',
    td = [data_type], '',
    td = [percent_of_max]
    FROM @tt_alert_data
    ORDER BY row_id
    FOR XML PATH('tr'), TYPE
    ) AS NVARCHAR(MAX)) +
    N'</table></body></html>';
    
    -- Send the email (NB: This assumes that you have a default Database Mail profile configured)
    EXEC msdb.dbo.sp_send_dbmail
    @recipients = @alert_email_recipients,
    @subject = @alert_subject,
    @body = @alert_body,
    @body_format = 'HTML',
    @importance = 'High';
    END
    END
    GO

    It’s not very complex – it iterates through all user databases on the server, and then over all user tables that have an identity column, and simply selects the maximum values from each of those identity columns.  This max value is then compared to the max value for the respective data type of the column (i.e. tinyint, smallint, int, or bigint).

    What it does after that is up to you. In this case I send an email with a nicely formatted HTML email displaying the offending columns.  Obviously you can customise what action to take, and what your threshold is. And yes, the title of this post is a little misleading since this doesn’t actually prevent anything, but rather alerts you to impending doom.

    Warning: Before running this on a production system, make sure that you’re happy for it to be selecting from all of your (potentially large and/or busy) tables that have identity columns.  If you have parallel test/dev systems, then run them there instead to prevent any negative performance impact in production. I also take no responsibility for bugs or unexpected behaviour. Consult your GP before taking.

    Cheers

    DB Dave

  • PASS Summit trip wrap-up

    I left for the US just over 2 weeks’ ago to attend the PASS Summit in Seattle, followed by a week in San Francisco doing the tourist thing.

    Basically, the Summit was awesome. The pre-con sessions were amazing (and intense!) – I think my brain was physically stretched during Itzik Ben-Gan’s session on advanced T/SQL.

    I attended quite a few level-400 & 500 sessions on SQL Server internals, memory management, etc, as well as some “peripheral” topics like Event Notifications and Bob Ward’s excellent “Inside TempDB” session.

    I got along to the networking dinner at Lowell’s on Monday night, where I met & had dinner (and a few drinks) with Sarah Barela (blog|twitter) and Andrew Calvett (blog|twitter) which was a nice start to the week. I also managed to see a little of Seattle on the weekends before and after the Summit, and got some nice photos along the way.

    I’ve definitely arrived back at work feeling re-energised, and with lots of new ideas about different ways of monitoring our systems, troubleshooting existing problem areas, and how to better architect new features in future.

    After PASS, I flew down to San Francisco where I had decided to stay for a week (it’s a long way to travel from New Zealand to the US just for 1 week).  I hired a Harley Fat Boy for 4 days and got out around Northern California a bit which was awesome.  No planned routes, just a rough idea, and I rode each day until I was tired and then looked for a motel to stay the night.

    I ended up travelling from San Francisco up the coast to Monte Rio, then up through Fort Bragg and around to Clear Lakes, then across to Lake Tahoe, down to Placerville, and back to San Francisco by Thursday afternoon.

    An amazing experience, and one I’ll definitely try to do again (maybe with my wife next time).

     

    The one very sad thing that happened while I was away though was that South African were knocked out of the Rugby World Cup. Sad smile But at least New Zealand ended up winning it, so I can kinda half claim that as a home team victory I guess.  Winking smile

     

    Cheers,
    Dave

  • How’s the weather in Seattle?

    PASS Summit

    I’m fortunate enough to be heading to Seattle this weekend for the PASS Summit courtesy of my employer (thanks again!). Smile It’s my first visit the the US as well, so I’m extra-excited!

    The weather here (in Wellington, New Zealand) is pretty average at the moment – I still haven’t completely dried out after this evening’s motorcycle ride home.  So I really hope the weather in Seattle is going to be better next week.

    As for the Summit itself, I think I’ve prepared as well as possible;

    I’m going to try to wrap up any lose ends at work tomorrow, and will then have a think about what I specifically want to get out of the Summit, and any technical issues I can raise with the many smart people I’m likely to bump into over the course of next week.

    After the Summit

    Harley Davidson FatboyHow’s the weather in San Francisco? ‘Cos that’s where I’m heading for the week after the conference.  I fly into San Fran next Saturday evening, will spend Sunday poking around the city a little, and then on Monday morning I’m off for 4 days on a rented Harley Fat Boy. Very much looking forward to that!  I’m planning to ride a circular route east of San Francisco, maybe as far as Carson City, and down as far as Fresno on the return leg.

    I’ve spoken to someone who did a very similar ride, so have some idea of distances and how far I can expect to ride each day – but I won’t really know until I’m there, in the saddle so to speak.  I’ve even been growing out the beard a bit to complete the leather-clad ensemble.

    Winking smile Awesome…

    Bad timing

    The only bad bit of luck I’ve run into so far is that my return flight in 2 weeks’ time is at the same time as the final of the Rugby World Cup. *

    Bryan HabanaThis is a problem. In case you didn’t already know, I’m a South African, and South Africa is home to the Springboks (aka the best rugby team in the world).  I grew up on a diet of rugby, I played rugby at school, I played rugby after school… you get the picture.  In other words, I have high hopes that South Africa will win the World Cup (again).

    So please, when I get off the plane in Auckland, could the entire country act as if nothing’s happened until after I watch the recorded game? Thanks.

    * For those of you in countries unfortunate enough to not play the game, rugby is kind of like football, except the ball is oval, you carry it instead of dribbling it along the ground, and the players are real men (not drama-queen-cry-baby-pussies). And yes America, you guys do actually have a team playing in the World Cup.

    South Africa - 2007 World Cup Winners

    Cheers,
    DB Dave

  • SQL Server word-level bigram function

    I work for an online auction website, Trade Me, and as such collect quite a large quantity of search data (i.e. full text searches that people perform).  Over the years I’ve been involved in various aspects of our search; from tweaking the nuts and bolts of the full-text search itself (see my blog posts about boolean search operators for example) to working on our recommendation algorithm, and implementing search auto-suggest.

    Something I’m looking into at the moment is changing the way we recommend alternative search terms when we think you’ve misspelt something – i.e. our “did-you-mean” system.

    Did you mean…?

    The way we currently do it is based on taking the individual words entered, and generating a large number of alternative spellings for each of them, and then comparing those lists of words against a “known-good” list of search terms and returning the most likely one for each word entered (according to a magic score based on frequency, average results per search, etc).  This is done on the fly and is surprisingly quick (it’s all done within TSQL functions and procedures), but I want to experiment with word level n-grams (unigrams, bigrams, and trigrams specifically).

    Basically, this involves taking each search string and splitting it into individual words to give you your unigrams, or then pairing them up (in the case of bigrams).

    For example, the string “pink ipod nano” would be returned as:

    Word 1Word 2
    #pink
    pinkipod
    ipodnano
    nano#

    The hash symbols are delimiters and are sometimes not used, in which case the 1st and last rows of this table wouldn’t be returned. In my function below, I use a less common symbol (the “Eszett”) as my delimiter, and have a parameter to allow it to be omitted if necessary.

    What this allows you to do is relatively easily store a large set of known n-grams and their occurrence frequencies (i.e. how common they are), and then predict either what word the user is likely to type next (in the case of an auto-suggestion system), or what the user actually meant to type in the case of a misspelling.

    For example, if a user types “ipod” followed by a space into the search box, then given a large enough working dataset you should be able to predict what the next word is likely to be (even more so as they start typing the first few letters).  Or, if the same user types in “ipod nqno“, then we should be able to work out that firstly “nqno” isn’t a common word according to our list of unigrams, and secondly that the word “ipod” followed by a space and “n” is most likely meant to be “ipod nano” – and we can then recommend that as an alternative search.

    Obviously this is very high level, and there are many, many other factors and variables to take into account (which is what makes it interesting!) – but it gives you a little background to the original point of this blog post; sharing my bigram function. 🙂

    The guts

    It’s relatively self explanatory, and would likely be rewritten as a CLR function before going into production, but having a TSQL function allows me to test my theory myself before asking a developer to spend any time on it.

    -- =============================================
    -- Author:		David Curlewis
    -- Create date: 2011-09-08
    -- Description:	Splits a string and returns word-level bigrams (i.e. all pairs of words making up the string)
    -- =============================================
    CREATE FUNCTION dbo.fn_word_bigrams (
        @string             NVARCHAR(1000),
        @include_delimiters BIT = 1 -- whether to include rows for prefix and suffix delimiters
    )
    RETURNS @bigrams TABLE (
        row_id  INT, 
        word1   NVARCHAR(100),
        word2   NVARCHAR(100) 
    )
    AS
    BEGIN
        DECLARE @rowcount   INT,
                @i          INT;
    
        DECLARE @tt_words   TABLE (row_id INT, word NVARCHAR(100))
        DECLARE @tt_bigrams TABLE (word1 NVARCHAR(100), word2 NVARCHAR(100))
    
        SET @i = 1;
        SET @string = ltrim(rtrim(@string)) 
            + CASE @include_delimiters WHEN 1 THEN ' ß' ELSE '' END;
    
        WITH split_words AS
        (
            SELECT  ROW_NUMBER() OVER(ORDER BY @@SPID) AS row_id,
                    item AS word
            FROM    dbo.SplitString_Multi(@string, ' ')
        )
        , bigrams AS 
        (
            SELECT  row_id,
                    CAST('ß' AS NVARCHAR(4000)) AS word1, -- ß is our delimiter
                    CAST(word AS NVARCHAR(4000)) AS word2
            FROM    split_words
            WHERE   row_id = 1
            UNION ALL
            SELECT  b.row_id + 1 AS row_id,
                    CAST(b.word2 AS NVARCHAR(4000)) AS word1, 
                    CAST(w.word AS NVARCHAR(4000)) AS word2
            FROM    split_words AS w
            JOIN    bigrams AS b ON w.row_id = b.row_id + 1
        )
        INSERT  @bigrams
        SELECT  row_id,
                word1,
                word2
        FROM    bigrams
        WHERE   row_id > CASE @include_delimiters WHEN 1 THEN 0 ELSE 1 END;
        
        RETURN; 
    END

    Shout if you have any questions or comments.

    Cheers,

    DB Dave

  • Tech Ed done, bring on PASS Summit!

    I’m feeling somewhat human again after having caught up on some sleep back at home after 4 days in Auckland at Tech Ed New Zealand.

    Thanks again to those who came to our session (DBI207 – Trade Me: 0-70 million pages in 0.34 Gigaseconds) – and specially to those who took the time to leave feedback.  Overall we got a good response, with the only gripe being that it wasn’t technical enough.  Probably a fair enough observation.

    It was classified as a level 200, but that was set before we actually knew exactly what we were going to present on – so in hind sight level 100 might’ve been more accurate. I also spent a little longer on some of the preamble than I wanted to, which meant the more technical stuff had to be rushed a little. This is something that will improve with speaking experience I think.  So, if you didn’t get what you wanted from it, I apologise, but I’m keen to do a technical session at a later date so keep an eye out for that at a conference or user group near you. 🙂

    Simon, another work colleague, also presented a session (WUX207 – Releasing Trade Me’s Latest Project) which was awesome. My excuse is that he’s spent a fair amount of time on-stage in a past life doing improv. :-p  Needless to say there was some healthy competition between Matt & I, and Simon as far as feedback scores were concerned.  He won…

    The rest of the conference seemed to go pretty well – congrats to those involved in organising it.  I didn’t get to see as many sessions as I wanted to, but enjoyed the few I did attend, and heard mostly positive feedback from the other guys from work who attended.

    Here are all the recorded sessions from this year, including our session, and Simon’s.

    So, the next thing on my calendar is PASS Summit in Seattle in October. Super excited! Never been to the US before, and can’t wait to attend pre-cons and sessions presented by folks who I’m sure will explode my brain. I’ve signed up as a “first timer” (my “big brother” is Craig Purnell), and I’ll probably also be up for the Monday night dinner at Lowell’s, so might see you there!

    Cheers,
    DB Dave

  • Getting ready for Tech Ed New Zealand

    Sky Tower, Auckland
    Sky Tower, Auckland

    Tech·Ed NZ kicks off the day after tomorrow at the Auckland SkyCity convention centre.  I’ve been to a few in the past, so nothing new there – but what makes this year different is that myself and a colleague will be presenting a session:

    DBI207 – Trade Me: 0-70 million pages in 0.34 Gigaseconds

    Its not a terribly technical session, but rather a session about our (i.e. my employer’s) infrastructure & architecture.  Given the nature of our architecture it is a fairly database-centric session, although we’ll cover the network and application layers briefly as well. It’s at 13:55 on Wednesday if I’ve whet your appetite.
    Oh, and if you don’t know what a “gigasecond” is, then Wikipedia is your friend.  It sounded better than “0-70 million pages in roughly 11 years“. 😉

    It’s a tweaked and updated version of the session I presented at the Wellington SQL Server User Group (along with my colleague, Moana), then at the Christchurch .Net User Group, and most recently the Auckland SQL Server User Group.

    Simon, our head of dev is also presenting a session this year. His will be about our development and deploy processes, culminating in the live launch of a new Trade Me “thing” (I can’t really describe it any more accurately without hinting at what it is, and it’s super top secret squirrel and I’ll be shot if I let it slip early). His session is at 13:55 on Thursday, so it’ll be worth a look-see.

    So yeah, hopefully it all goes well! If you’re coming along be sure to say hi before or after the session. There will be around 10 of us from Trade Me attending this year, so the chances of bumping into one of us is fairly good. 🙂

    Cheers
    DB Dave