Author: DB Dave

  • Look where you want to go

    My yellow machineI ride a motorbike to and from work every day. I thoroughly enjoy getting out on my Suzuki whenever possible – in fact I no longer even own a car. So obviously it was only a matter of time before I tried to wrangle in some kind of tenuous link between bikes and SQL Server. J So here goes…

    Quite a common mantra in the biking world is “look where you want to go”. The idea being that the bike goes towards where you’re looking; in other words, if you’re going around a corner, you should be looking through the corner to where you want to go, not staring at the road a few meters ahead of your front wheel.

    If you should look where you want to go, then we can assume that you shouldn’t look where you don’t want to go, right? Where this becomes really important is if/when you find yourself in a bad situation on the road. If you don’t want to ride into the pot-hole, or hit the back of the car which has suddenly stopped in front of you, then look for ways past the obstacle.

    Right, so what the hell does this have to do with the price of eggs?

    Well, a few days ago I was stuck in the office late, trying to fix a particularly stubborn issue with a new SQL instance. Everything had gone pretty smoothly, right up until the point that the full-text catalogues refused to build. I spent an hour or two trying everything I could think of, to no avail. So I went to the kitchen to make myself a coffee, and sat for a few minutes doing something completely unrelated to the full-text problem at hand (i.e. I whipped out the iPhone and read a few of the most recent blog posts I’d bookmarked in Instapaper).

    I returned to my desk, coffee in hand, and instead of diving straight back into what I had been trying (which obviously wasn’t working) I took a different tack and looked at the problem from a new angle. Another 15 minutes or so and I had the problem wrapped up.

    I had been so fixated on the problem that I could no longer think a way around it. This is the cause of many motorbike accidents, and it’s called “target fixation”. This is where the car pulls out in front of you and you think “WTF!? I’m going to hit their rear fender!”, shortly followed by you hitting their rear fender because that’s what you were staring at.

    Instead, we need to train ourselves to take in the fact that the car has pulled out, while also looking at the gap behind them which is still wide enough to fit through. Then once you’re safely through the gap, feel free to let loose the mighty single-digit salute. J

    Photo by hugo604bcThis incident reminded me to keep looking for alternative ways around a problem. This fix involved me using tools I don’t normally use, and looking at obscure logs which I normally don’t give a second thought – all of which made it that much more satisfying. So the next time you’re troubleshooting a stubborn issue, make sure you don’t get fixated on what’s right in front of you. Rather, look for a way around the inconsiderate prick in the car issue.

    Now go and give your server an oil change, and lube your chain. Ok, enough motorbike analogies (for today).

    Cheers,
    DB Dave

  • Full-Text Search Operators – Part 2: Replacing alternate operators

    Photo by natashalcdIn this post I’m going to deal with another piece of the puzzle that I described in part 1. Now, after re-reading that post it became obvious that I probably need to outline what the end result is meant to look like:

    We want to be able to pass in a search string, and get back a nicely formatted full-text search clause which makes full use of Boolean operators, respects parentheses, and supports wildcards. Easy, right? Yup, and we need to do it all using only T/SQL.

    Here’s the steps that make up the final function (in a nutshell):

    1. Remove some dodgy characters (vertical-tab, form-feed, etc)
    2. Tidy up any parentheses [part 1]
    3. Replace any alternate operators with place-holders (most operators have 2 representations; e.g. OR and |, AND and &, NOT and , etc).
    4. Break the search term into individual words (i.e. split the string on space characters)
    5. Using this list of individual words, do the following:
      1. Classify each word as an operator, search term, parenthesis, wildcard, etc.
      2. Tidy up the list, removing illegal and unnecessary characters
      3. Remove invalid operators (e.g. “ipod NOT” is not a valid search term)
      4. Remove unnecessary parentheses (e.g. single words wrapped in parentheses)
      5. Handle wildcards. Since full-text only accepts prefix searches, change all invalid wildcard searches into prefix searches.
    6. Check for single and double-quotes (& tidy up any invalid quote-marks, make sure only valid pairs remain, etc.).
    7. Process any NEAR(~) operators.
    8. Put the final Full-Text command string together.

    Right, so we’ve already taken care of number 2 in the previous post, so now on to number 3 – replacing the alternate operators.  This step may not be necessary for you, since it’s sole purpose is to preserve the original search string for our reporting purposes.

    You see, if I search for “ipad OR iphone NOT ipod” and you search for “ipad | iphone –ipod”, then obviously since they’re really the same search we will get the same results (and our cache should also cache them as 1 search), but for reporting purposes we want to know that our searches were typed in differently.  The way we do that in our implementation of this function, is that we return a normalised search string used to perform & cache the search, and an original string used for reporting.

    Here’s the very straight forward code:

    IF OBJECT_ID('[dbo].[fn_search_clause_operator_replacer]') IS NULL
    EXEC('CREATE FUNCTION [dbo].[fn_search_clause_operator_replacer] () RETURNS INT AS BEGIN RETURN(0); END ');
    GO
    /***************************************************************************************************************************
    This is a supporting function, which is called by the function [fn_search_clause_get].
    It accepts a string input, and outputs the same string, but with any character operators replaced with
    placeholder text-operators instead.
    Author : David Curlewis
    Date : 07/2009
    
    This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License
    http://creativecommons.org/licenses/by-nc/3.0/
    ****************************************************************************************************************************/
    ALTER FUNCTION [dbo].[fn_search_clause_operator_replacer] (
    @search_text NVARCHAR(4000)
    ) RETURNS NVARCHAR(4000)
    AS
    BEGIN
    DECLARE @not_pattern VARCHAR(50),
    @not_pos INT;
    
    SET @search_text = REPLACE(REPLACE(REPLACE(@search_text,
    '|', ' `OR` '), -- OR
    '~', ' `NEAR` '), -- NEAR
    '&', ' `AND` '); -- AND
    
    -- the NOT operator requires some extra work because we only consider it an operator if it is not hyphenating a word
    SET @not_pattern = '%[^a-z0-9][-][ a-z0-9"''(]%'; -- valid NOT operator pattern
    SET @not_pos = ISNULL((SELECT PATINDEX(@not_pattern, @search_text)), 0); -- get the positions of any valid "-" operators
    WHILE (@not_pos > 0)
    BEGIN
    SET @search_text = STUFF(@search_text, @not_pos, 2, ' `NOT` ');
    SET @not_pos = ISNULL((SELECT PATINDEX(@not_pattern, @search_text)), 0); -- get pos of next operators (if any)
    END
    
    RETURN(@search_text);
    END
    GO

    The first bit is very simple; it’s just a REPLACE statement replacing each operator’s “symbol operator” (i.e. |, ~, &) with a bit of placeholder text (i.e. `OR`, `NEAR`, `AND`).

    The NOT operators require a little more work because we don’t want to interpret all hyphens as operators, otherwise you wouldn’t be able to search for hyphenated words. Using PATINDEX we can very easily match on a simple RegEx-like pattern: [^a-z0-9][-][ a-z0-9″”(]

    This looks for any occurrences of the hyphen, where it follows a non-alphanumeric character, and precedes a space, an alphanumeric character, a double or single quote, or an opening parenthesis. It then replaces each occurrence with the placeholder text `NOT`.

    In the next post I’ll go through the meatiest bit, which involves splitting the string (number 4), and preparing the resulting data (number 5).

    Cheers

    DB Dave

  • Removing accented and illegal characters from a string

    This post isn’t related to the series I’m currently writing on Full-Text search operators, but it is in the same ball-park so I thought I’d throw it up here.

    Today I was working on our search term auto-suggest system (i.e. the system that shows a list of suggestions in a drop-down list below the search box as you type your search term in), and needed to improve the way we strip out “illegal” characters from our dictionary terms.

    Illegal characters in our case refers to pretty much anything that isn’t alphanumeric, or a space, dollar sign, ampersand, hyphen, or period.  We don’t even want any accented characters (i.e. letters with diacritics) – although in this case we want to replace accented characters with their un-accented counterparts (e.g. Å should be replaced by A).

    The way I did this (and I’m sure there are many, many better ways) is to insert illegal characters into a table variable, along with a replacement character, and then update the input variable for each row in this table variable – replacing any matching illegal characters each time with it’s replacement value.

    IF OBJECT_ID('[dbo].[fn_clean_search_term]') IS NULL
    	EXEC('CREATE FUNCTION [dbo].[fn_clean_search_term] () RETURNS INT AS BEGIN RETURN(0); END ');
    GO
    /***************************************************************************************************************************
    This accepts a string input, and outputs the same string, but with any invalid characters removed, or replaced if they
    are accented characters. Note: not all accented characters are accounted for, only the most common ones in our data.
    Also, this does not handle Unicode characters.
    
    Author : David Curlewis
    Date : 07/2009
    
    This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License
    http://creativecommons.org/licenses/by-nc/3.0/
    ****************************************************************************************************************************/
    ALTER FUNCTION [dbo].[fn_clean_search_term] (
    	@keyword VARCHAR(2048)
    ) RETURNS VARCHAR(2048) WITH SCHEMABINDING
    AS
    BEGIN
    
    	-- Only do the heavy lifting if there are non-alphanumeric characters in the string
    	IF PATINDEX('%[^A-Za-z0-9 ]%', @keyword) > 0
    	BEGIN
    
    		-- if the string contains anything other than "valid" characters, strip them out
    		DECLARE @chars TABLE ([ascii] TINYINT NOT NULL, [new_char] VARCHAR(1));
    
    		;WITH illegal_chars AS
    		(
    			SELECT	0 AS [ascii]
    			UNION ALL
    			SELECT	[ascii] + 1
    			FROM	illegal_chars
    			WHERE	[ascii] < 255
    		)
    		INSERT	@chars ([ascii], [new_char])
    		SELECT	[ascii],
    				CASE -- Replace accented letters with non-accented letters to normalise the search terms
    				-- Unless your collation is case sensitive, the lower &amp; upper case variants are equivalent
    				WHEN CHAR([ascii]) IN ('á','à','â','ä','å') THEN 'a'
    				WHEN CHAR([ascii]) IN ('Á','Ã','Ä','Å','À','Â') THEN 'A'
    				WHEN CHAR([ascii]) IN ('ç') THEN 'c'
    				WHEN CHAR([ascii]) IN ('Ç') THEN 'C'
    				WHEN CHAR([ascii]) IN ('ê','é','ë','è') THEN 'e'
    				WHEN CHAR([ascii]) IN ('Ê','Ë','É','È') THEN 'E'
    				WHEN CHAR([ascii]) IN ('ï','í','î','ì') THEN 'i'
    				WHEN CHAR([ascii]) IN ('Í','Ì','Î','Ï') THEN 'I'
    				WHEN CHAR([ascii]) IN ('ñ') THEN 'n'
    				WHEN CHAR([ascii]) IN ('Ñ') THEN 'N'
    				WHEN CHAR([ascii]) IN ('ô','ö','ò','õ','ó','ø') THEN 'o'
    				WHEN CHAR([ascii]) IN ('Ó','Ô','Õ','Ø','Ö','Ò') THEN 'O'
    				WHEN CHAR([ascii]) IN ('š') THEN 's'
    				WHEN CHAR([ascii]) IN ('Š') THEN 'S'
    				WHEN CHAR([ascii]) IN ('ú','ü','û','ù') THEN 'u'
    				WHEN CHAR([ascii]) IN ('Ù','Ú','Ü','Û') THEN 'U'
    				WHEN CHAR([ascii]) IN ('ÿ','ý') THEN 'y'
    				WHEN CHAR([ascii]) IN ('Ÿ','Ý') THEN 'Y'
    				WHEN CHAR([ascii]) IN ('ž') THEN 'z'
    				WHEN CHAR([ascii]) IN ('Ž') THEN 'Z'
    				ELSE '' -- remove the character
    				END AS [new_char]
    		FROM	illegal_chars
    		WHERE	[ascii] NOT BETWEEN 48 AND 57 -- 0 -> 9
    		AND		[ascii] NOT BETWEEN 65 AND 90 -- A -> Z
    		AND		[ascii] NOT BETWEEN 97 AND 122 -- a -> z
    		AND		[ascii] NOT IN ( -- explicitly allow the following characters
    					32, -- <space>
    					36, -- $
    					38, -- &amp;
    					45, -- -
    					46 -- .
    					)
    		OPTION (MAXRECURSION 255);
    
    		UPDATE @chars
    		SET @keyword = REPLACE(@keyword, CHAR([ascii]), [new_char])
    
    	END
    
    	-- remove multiple spaces
    	WHILE CHARINDEX(' ', @keyword) > 0
    	SELECT @keyword = REPLACE(@keyword, ' ', ' ');
    
    	SELECT @keyword = LTRIM(RTRIM(@keyword)); -- trim leading &amp; trailing whitespace
    	SELECT @keyword = LOWER(@keyword); -- return lower-case only
    
    	RETURN @keyword;
    END
    GO

    Let me know if you have any questions, or different/better ways of doing this (yes, I know I could do it using a couple of lines of .Net in a CLR function, but I don’t have CLR enabled on this server). Smile

    Cheers

    DB Dave

  • Full-Text Search Operators – Part 1: Cleaning up parentheses in a string

    BooleanBeards

    Part 1: Cleaning up parentheses in a string
    Part 2: Replacing alternate operators
    Part 3: Splitting the search string
    Part 4: Wrapping it up

    A few years ago I wanted to have a crack at improving one of our websites’ search by providing “search operator” functionality.  These are basically special words or characters (i.e. operators) which provide Boolean search capability; e.g. AND, OR, NOT.

    I looked around but couldn’t find anything implemented using T/SQL.  There were a few .Net solutions, but I’m not a developer, and can only handle very basic .Net on a good day.  So I had a crack at writing my own. I started off with the Google cheat-sheet as my initial list of requirements, which was basically as follows:

    1. Must support the operators AND, OR, NOT, and NEAR
    2. Must allow for “exact-phrase” searches
    3. Should respect parentheses (for precedence control)
    4. Should be able to perform wildcard searches
    5. Operators must be case-sensitive (i.e. OR != or)

    To attack this problem, I decided to split it into multiple chunks, and write separate blocks of code to meet each requirements.  In this post I’m going to introduce the first piece of the puzzle (which is number 3 from the list above; handling parentheses).  What this means is that given a string containing parentheses, I need to output the same string with valid parentheses intact (and obviously invalid ones removed).

    Below is the function. I won’t explain it line by line since its pretty well commented, and relatively simple anyway.

    IF OBJECT_ID('[dbo].[fn_search_clause_parentheses_handler]') IS NULL
    EXEC('CREATE FUNCTION [dbo].[fn_search_clause_parentheses_handler] () RETURNS INT AS BEGIN RETURN(0); END ');
    GO
    /***************************************************************************************************************************
    This is a supporting function, which is called by the function [fn_search_clause_get].
    It accepts a string input, and outputs the same string, but with any invalid parentheses removed, and all remaining
    valid parentheses buffered (by a space on each side) so that they are split correctly by the calling function.
    Author : David Curlewis
    Date : 07/2009
    
    This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License
    http://creativecommons.org/licenses/by-nc/3.0/
    ****************************************************************************************************************************/
    ALTER FUNCTION [dbo].[fn_search_clause_parentheses_handler] (
    @search_text NVARCHAR(4000)
    ) RETURNS NVARCHAR(4000)
    AS
    BEGIN
    -------------------------------------------------------------------------------------------------
    -- PARENTHESES HANDLER
    -------------------------------------------------------------------------------------------------
    -- IF parentheses exist in the string
    IF (PATINDEX('%[()]%', @search_text) > 0)
    BEGIN
    DECLARE @pos_o INT, @pos_x INT, @charpos INT;
    DECLARE @tt_char_split TABLE (id SMALLINT IDENTITY(1,1), [char] VARCHAR(2) NOT NULL);
    SET @charpos = 1;
    
    -- split the string apart into a temp table
    WHILE @charpos <= LEN(@search_text)
    BEGIN
    INSERT @tt_char_split ([char])
    SELECT SUBSTRING(@search_text, @charpos, 1);
    
    SET @charpos = @charpos + 1;
    END
    
    -- while we have opening and closing parentheses
    WHILE EXISTS(SELECT TOP 1 1 FROM @tt_char_split WHERE [char] = '(')
    AND EXISTS(SELECT TOP 1 1 FROM @tt_char_split WHERE [char] = ')')
    BEGIN
    -- Get the position of the first closing parenthesis
    SET @pos_x = ( SELECT MIN(id)
    FROM @tt_char_split
    WHERE [char] = ')');
    
    -- Get the position of the first opening parenthesis
    SET @pos_o = ( SELECT MAX(id)
    FROM @tt_char_split
    WHERE [char] = '('
    AND id < @pos_x);
    
    -- there is a valid pair of parentheses
    IF (@pos_o IS NOT NULL AND @pos_x IS NOT NULL)
    BEGIN
    -- Escape this pair so we know they've been processed
    UPDATE @tt_char_split
    SET [char] = '(('
    WHERE id = @pos_o;
    
    UPDATE @tt_char_split
    SET [char] = '))'
    WHERE id = @pos_x;
    END
    ELSE BEGIN -- there is not a valid pair of parentheses
    UPDATE @tt_char_split
    SET [char] = ''
    WHERE id IN (@pos_o, @pos_x);
    END
    END
    
    -- remove any remaining (invalid) parentheses
    UPDATE @tt_char_split
    SET [char] = ''
    WHERE [char] IN ('(', ')');
    
    SET @search_text = '';
    
    -- build new search string
    SELECT @search_text = @search_text + REPLACE(REPLACE([char], '((', ' ( '), '))', ' ) ')
    FROM @tt_char_split
    ORDER BY id;
    
    -- remove empty pairs, i.e. " ( ) "
    WHILE CHARINDEX(' ( ) ', @search_text) > 0
    SET @search_text = REPLACE(@search_text, ' ( ) ', '');
    END
    
    RETURN(@search_text);
    END
    GO

    So, its a scalar function which accepts a string, and returns that same string, but with it’s parentheses tidied up and “buffered” (i.e. I wrap them in spaces to make it easier for the calling function to then split the string into its component parts).

    Now, as I sit here watching the latest straight-to-dvd masterpiece that is “Barbie: A fashion fairy-tale(hey, its a Sunday afternoon and I have 3 daughters – you’d lose the battle too!), I’m having a hard time getting the code to post properly, so if you have any issues, flick me a note in the comments below and I’ll sort something else out.

    Give it a try and let me know if you find any bugs (or make any improvements!). I’ll follow up soon with posts describing the rest of the process, but for now I’ve had enough of Barbie, and think I deserve a whisky. Winking smile

    Cheers

    DB Dave

  • Generate scripts to detach/attach all databases

    I haven’t had time to do any Denali testing since my last post, so instead I thought I’d share a script I wrote a while ago which simply generates a script to attach one or more databases using their current file paths (and optionally then detaches these databases).

    The reason I wrote this script in the first place was because we have a number of environment (i.e. dev, test, stage, reporting, etc), and some of these databases are made up of quite a few database files (i.e. more than a dozen or so).
    So on the odd occasion when I’ve needed to detach a database, move some files around, and re-attach said database using the Management Studio GUI, I’ve found it frustrating having to manually browse to each file’s location over and over again. And I just prefer doing stuff in T/SQL over using the GUI – makes me feel more in control I think.

    So, with this script I could easily generate the “attach” code for each database (including all file paths), make any path or filename changes necessary, and re-attach all the databases with a single mouse click!

    USE [master];
    
    DECLARE @database NVARCHAR(200),
    @cmd NVARCHAR(1000),
    @attach_cmd NVARCHAR(4000),
    @file NVARCHAR(1000),
    @i INT;
    
    DECLARE dbname_cur CURSOR STATIC LOCAL FORWARD_ONLY FOR
    SELECT  RTRIM(LTRIM([name]))
    FROM    sysdatabases
    WHERE   [dbid] &gt; 4  -- exclude system databases
    
    OPEN dbname_cur
    FETCH NEXT FROM dbname_cur INTO @database
    WHILE @@FETCH_STATUS = 0
    BEGIN
    SELECT @i = 1;
    
    -- Initial attach command stub
    SET @attach_cmd = '-- ' + QUOTENAME(@database) + CHAR(10)
    + 'EXEC sp_attach_db @dbname = ''' + @database + '''' + CHAR(10);
    
    -- Get a list of files for this database
    DECLARE dbfiles_cur CURSOR STATIC LOCAL FORWARD_ONLY FOR
    SELECT  physical_name
    FROM    sys.master_files
    WHERE   database_id = db_id(@database)
    ORDER   BY [file_id];
    
    OPEN dbfiles_cur
    FETCH NEXT FROM dbfiles_cur INTO @file
    WHILE @@FETCH_STATUS = 0
    BEGIN
    SET @attach_cmd = @attach_cmd + '    ,@filename' + CAST(@i AS NVARCHAR(10)) + ' = ''' + @file + '''' + CHAR(10);
    SET @i = @i + 1;
    FETCH NEXT FROM dbfiles_cur INTO @file
    END
    
    CLOSE dbfiles_cur;
    DEALLOCATE dbfiles_cur;
    
    -- Output "attach" command.
    PRINT @attach_cmd;
    
    -- DETACH (uncomment the following line at your peril - IT WILL DETACH ALL USER DATABASES!)
    /*EXEC sp_detach_db @dbname = @database, @skipchecks = 'true';  -- change this to false if you want it to update stats before detaching*/
    
    FETCH NEXT FROM dbname_cur INTO @database
    END
    
    CLOSE dbname_cur;
    DEALLOCATE dbname_cur;

    The usual disclaimers apply; I’ve used this in a limited number of scenarios so I wouldn’t consider it thoroughly tested, and if you run it in a production environment and “accidentally” detach databases, or otherwise break something – don’t blame me.

    Cheers
    Dave

  • Do you want a job? Then don’t do this…

    We’re always looking for good “database folk” (i.e. DBA’s or database developers).  Its fair to say that the pickings have been pretty slim for some time now, but we still get a constant trickle of applications that make me want to laugh (and sometimes cry).

    Here are a few “pain-points” I’ve come across recently in our recruiting efforts:


    Dnt us txt-spk in yr application

    While writing in “txt-speak” may convey to your potential future employer that you’re young and in touch with modern pop-culture, it doesn’t instil a great deal of confidence in your intellectual abilities. I like to know that you can actually spell “thanks” (hint, it doesn’t have an “x” in it).

    Don’t just use a crappy cover-letter template

    I’ve had a few cases of deja vu where several candidates have had remarkably similar letters, if not exactly the same.  I’ve even had one applicant leave in the template place-holders… I’m sorry, we don’t currently have an opening for a [job-title] here at [company-name].

    Make sure you send the right stuff

    I’m all for you customising your cover-letters, and even CV’s, to reflect the position you’re applying for, but make 110% sure that you send me the right ones. If you’re applying for a “Junior DBA” role at “Company A”, make sure your cover-letter doesn’t thank me for considering your application for the position of “PHP Developer” at “Company B” (which, by the way, is way cooler than Company A).

    Read the advertisement

    Seems simple enough, right?  Just double check that the company isn’t asking for something in addition to the usual cover-letter & CV combo.  We often ask candidates to answer several questions in their cover-letters.  These questions usually relate to the position or the company, and help to show us how you think, how much effort you’re willing to put into your application, etc.  Needless to say, you may get a few “negative points” if you overlook the questions entirely.


    Are there any I’ve missed? Anything you don’t agree with? Leave me a comment with your thoughts.

    Cheers
    Dave