Tag: script

  • Benchmarking I/O: The test rig

    Work has been a little crazy lately, but I’ve still been wanting to do some I/O performance testing since we’re due to get some new servers in soon and I wanted to be able to compare apples with oranges. So this past weekend I decided to combine something I’ve been meaning to learn more about (PowerShell) and some SQL scripts based on those described by Paul Randal (blog | twitter) in his post about testing Fusion-IO ioDrive Duo SSD’s.

    Just to be clear before going any further, I’ve tested these scripts in small scale (i.e. on my laptop) and they worked well, but I’m happy to have any bugs and/or crappy code pointed out to me in the comments. 🙂 Thanks.

    So, because I tend to over-engineer things, I laid out my requirements early on to prevent any self-induced scope-creep. These were:

    • It must accept parameters for anything the user would need to change
    • It must create the test database and any other required objects
    • It should automatically re-run the test the specified number of times (to allow averaging of results)
    • It should automatically record test results without requiring me to intervene between test runs
    • If possible, it should be able to automatically change test configurations (e.g. number of database data files) and re-run the tests without intervention

    The jury’s still out on how many of these goals I achieved, but let’s dive in anyway. I’m not going to post snippets of all the code – otherwise it’ll just get too bulky.  Instead I’ll highlight the more important bits, and then upload the whole shebang at the end for you to download.

    The PowerShell script basically controls the flow of the test process, but there are 5 SQL scripts which do the actual heavy lifting. The first 2 files create the database (with the specified number of data files, and of the specified size), as well as a few tables used in the test:

    -------------------------------------------------------------------------- 
    -- Create test database 
    --------------------------------------------------------------------------
    SET @cmd = N'
    	IF DB_ID(''' + @db_name + ''') IS NOT NULL DROP DATABASE [' + @db_name + ']; 
    	CREATE DATABASE [' + @db_name + '] ON PRIMARY (
    		NAME = N''' + @db_name + ''', 
    		SIZE = ' + CAST(@data_file_size_mb AS NVARCHAR) + 'MB , 
    		FILEGROWTH = 250MB, 
    		FILENAME = N''' + @db_data_path + @db_name + '.mdf'')'; 
    
    -- Add additional files 
    WHILE @i < $(NoOfDataFiles) 
    BEGIN 
    	SET @cmd = @cmd + CHAR(10) + ',(NAME = N''' + @db_name + CAST(@i AS NVARCHAR) 
    		+ ''', SIZE = ' + CAST(@data_file_size_mb AS NVARCHAR) + 'MB , FILEGROWTH = 250MB, FILENAME = N''' 
    		+ @db_data_path + @db_name + CAST(@i AS NVARCHAR) + '.ndf'')'; 
    		
    	SET @i = @i + 1; 
    END 
    
    SET @cmd = @cmd + CHAR(10) + ' LOG ON (NAME = N''' + @db_name + '_log'', SIZE = ' 
    	+ CAST(CASE WHEN $(TargetDbSizeGB) <= 1 THEN 128 WHEN $(TargetDbSizeGB) > 1 AND $(TargetDbSizeGB) <= 5 
    		THEN 512 WHEN $(TargetDbSizeGB) > 5 AND $(TargetDbSizeGB) <= 10 THEN 2048 ELSE 4096 END AS NVARCHAR) 
    		+ 'MB, FILEGROWTH = 100MB, FILENAME = N''' + @db_log_path + @db_name + '_log.ldf''); ALTER DATABASE [' 
    		+ @db_name + '] SET RECOVERY SIMPLE;'; 
    
    EXEC sp_executesql @cmd, N'';

    The main work table is a copy & paste from Paul’s post mentioned earlier, and has a clustered index on a GUID column which forces random I/O. I guess you could change this to an integer/identity column to change this to sequential I/O instead.

    The 3rd SQL script performs the inserts, and is copied from a related post from Thomas Kejser (blog) – I just added a bit that works out how many rows to insert based on the target database size specified.

    SELECT @counter = 0, @start = GETDATE(); 
    WHILE (@counter < @no_of_rows_to_insert_per) 
    BEGIN 
    	IF @counter % 1000 = 0 
    	BEGIN TRAN; 
    		INSERT INTO MyBigTable DEFAULT VALUES; 
    		SET @counter = @counter + 1; 
    		
    		IF @counter % 1000 = 999 
    			COMMIT TRAN; 
    END; 
    
    SET @end = GETDATE(); 
    INSERT INTO Results (duration) VALUES (CAST(DATEDIFF(SS, @start, @end) AS INT));

    The 4th script is directly from this post of Paul’s on wait stats – its definitely one worth adding to your toolbox.

    Finally, the 5th script aggregates each test run’s results into one row, so that the test rig can automatically blow everything away and run again without losing any results.

    One of the biggest challenges I faced (as the PowerShell n00b that I am) was figuring out how to asynchronously launch multiple processes and wait for them to finish.  In “batch file land” I would’ve just thrown together some “FOR %%i IN
 DO START sqlcmd
.” magic, but I wouldn’t allow myself to use anything but PowerShell for the flow control.  After some time spent searching around I found Start-Job which allowed me to launch a given number of instances of SQLCMD and then wait for them to all finish by calling Wait-Job (obviously!). Winking smile

    ###################################################################### 
    # Kick off the specified number of insert processes ($NoOfInsertProcesses) 
    ###################################################################### 
    $sj = { param($SQLInstanceName, $TestDBName, $InsertRowsPath, $TargetDbSizeGB, $ApproxRowByteSize, $NoOfInsertProcesses) sqlcmd -S "$SQLInstanceName" -d "$TestDBName" -i "$InsertRowsPath" -v TargetDbSizeGB=$TargetDbSizeGB ApproxRowByteSize=$ApproxRowByteSize NoOfInsertProcesses=$NoOfInsertProcesses } $(for ($x=1; $x -le $NoOfInsertProcesses; $x++){ Start-Job $sj -ArgumentList $SQLInstanceName, $TestDBName, $InsertRowsPath, $TargetDbSizeGB, $ApproxRowByteSize, $NoOfInsertProcesses } ) | Wait-Job #Wait for all insert processes to finish before continuing

    Funnily enough, one of my main “Aha!” moments was had while reading this post by Aaron Bertrand (blog | twitter) who isn’t your typical hard-core PowerShell or Windows Server blogger, but instead a SQL Server MVP – so either it was a sign I was on the right path, or Google knows far too much about my reading habits. Smile

    Okay – please feel free to download the scripts, tear them apart, point and laugh, etc (although, if you point and laugh, please at least tell me why so I can learn from it). As I said before, I haven’t run them on anything apart from my laptop (which I think would’ve taken several years to perform the full-blown test), so I’d be interested to hear of your experiences.

    IOTestScripts

    In the coming weeks I’m hoping to benchmark several of our existing servers (sitting on top of various storage platforms, such as SAN – both solid state & fibre channel, local disk, NAS, and RAM disks).

    Cheers,

    DB Dave

  • Full-Text Search Operators – Part 3: Splitting the search string

    Part 1: Cleaning up parentheses in a string
    Part 2: Replacing alternate operators

    We’ve now got to the point where we have a “clean” search string.  The next step is to split the string into rows which we can then manipulate without having to traverse the string itself.

    There are many “split to row” table-valued functions out there, and I’d recommend using a CLR-based one if you can (for performance reasons).  Basically the function accepts an input string and a delimiter, and outputs a table of individual search terms.

    Instead of simply inserting these rows into a temporary table or table variable at this stage, I first use a series of CTE’s to further refine the data.

    -------------------------------------------------------------------------------------------------
    -- SPLIT SEARCH STRING INTO INDIVIDUAL SEARCH WORDS
    -------------------------------------------------------------------------------------------------
    ;WITH
    banana_split AS
    (
    SELECT id AS rowid,
    LTRIM(RTRIM(data)) as word
    FROM [dbo].[fn_split_to_rows](@search_text, ' ') -- whatever split method you use, make sure it retains the case if you rely on UPPER-CASE operators
    WHERE LEN(LTRIM(RTRIM(data))) BETWEEN 1 AND 100 -- any single "word" longer than 100 chars is likely rubbish (http://en.wikipedia.org/wiki/Longest_word_in_English) ;-)
    ),

    This then feeds into a 2nd CTE which classifies each individual search term:

    words_classified AS
    (
    -- ********************************************************************************************
    -- The order of the CASE clauses below is important in deciding the order of preference
    -- given to the different special characters. For example, if a string contains both an
    -- asterisk and double quotes (e.g. "search*term") then the operator that appears first in
    -- the CASE statement below will take affect (e.g. either "search term" or search*).
    -- ********************************************************************************************
    SELECT rowid,
    word,
    CASE
    WHEN word = '(' THEN 'paren_o' -- Opening parenthesis
    WHEN word = ')' THEN 'paren_x' -- Closing parenthesis
    WHEN LEFT(word,1) = '"' OR LEFT(word,1) = '''' THEN 'search_dq_o' -- An opening exact phrase-quote
    WHEN RIGHT(word,1) = '"' OR RIGHT(word,1) = '''' THEN 'search_dq_x' -- A closing exact phrase-quote
    WHEN LEFT(word,1) = '"' AND RIGHT(word,1) = '"' THEN 'searchexplicit' -- An exact term search (i.e. "blah")
    WHEN LEFT(word,1) = '''' AND RIGHT(word,1) = '''' THEN 'searchexplicit' -- An exact term search (i.e. 'blah')
    WHEN LEFT(word,1) = '+' THEN 'searchexplicit' -- An exact term search (i.e. +blah)
    -- We explicitly use a case-sensitive (binary) collation to ensure case sensitivity for operators.
    WHEN (word COLLATE Latin1_General_BIN2) = ('AND' COLLATE Latin1_General_BIN2) THEN 'op_and' -- AND operator
    WHEN (word COLLATE Latin1_General_BIN2) = ('OR' COLLATE Latin1_General_BIN2) THEN 'op_or' -- OR operator
    WHEN (word COLLATE Latin1_General_BIN2) = ('NEAR' COLLATE Latin1_General_BIN2) THEN 'op_near' -- NEAR operator
    WHEN (word COLLATE Latin1_General_BIN2) = ('NOT' COLLATE Latin1_General_BIN2) THEN 'op_not' -- NOT operator
    WHEN word = '`AND`' THEN 'op_and`' -- AND operator (&)
    WHEN word = '`OR`' THEN 'op_or`' -- OR operator (|)
    WHEN word = '`NEAR`' THEN 'op_near`' -- NEAR operator (~)
    WHEN word = '`NOT`' THEN 'op_not`' -- NOT operator (-)
    WHEN CHARINDEX('*', word) > 0 THEN 'searchprefix' -- Wildcard operator (*)
    ELSE 'searchword' END AS operator
    FROM banana_split
    ),

    I wanted the search operators to be case sensitive in order not to change the behaviour of searches where people were including the words “and”, “or”, “not”, or “near”. This means that I needed to specify a case-sensitive collation for the string comparisons (obviously you can remove this if you don’t want case-sensitivity, or you database already uses a case-sensitive collation).

    Now another CTE tidies up by removing some superfluous characters, etc.:

    -------------------------------------------------------------------------------------------------
    -- NOW TIDY UP THE WORDS AND OPERATORS
    -------------------------------------------------------------------------------------------------
    tidy_up_round1 AS
    (
    -- tidy up some of the words
    SELECT rowid,
    REPLACE(
    CASE
    WHEN operator = 'searchexplicit' THEN REPLACE(word, '+', '') -- remove +'s FROM explicit searches
    WHEN operator = 'search_dq_o' THEN RIGHT(word, LEN(word)-1) -- remove the opening quote
    WHEN operator = 'search_dq_x' THEN LEFT(word, LEN(word)-1) -- remove the closing quote
    ELSE word END,
    '"', '') AS word, -- we can just kill all double-quotes now (any that are in the middle of words are not valid)
    operator
    FROM words_classified
    WHERE operator NOT IN ('op_and', 'op_and`') -- remove "AND" operators completely since these are the default inter-term operators anyway
    AND (PATINDEX('%[0-9a-z]%', word) > 0 OR operator LIKE 'paren%') -- only words which have 1 or more alphanumeric chars in them (unless its a parenthesis)
    ),
    tidy_up_round2 AS
    (
    SELECT ROW_NUMBER() OVER(ORDER BY rowid) AS rowid,
    word,
    operator
    FROM tidy_up_round1
    WHERE rowid BETWEEN -- exclude any operators that are incorrectly located at the beginnning or end of the string.
    (SELECT MIN(rowid) FROM tidy_up_round1 WHERE REPLACE(operator, '`', '') NOT IN ('op_not', 'op_near', 'op_or'))
    AND (SELECT MAX(rowid) FROM tidy_up_round1 WHERE REPLACE(operator, '`', '') NOT IN ('op_not', 'op_near', 'op_or'))
    ),

    Next, remove consecutive & duplicate operators


    -------------------------------------------------------------------------------------------------
    -- REMOVE CONSECUTIVE OPERATORS (LIKE "OR NOT" OR "NEAR NEAR", WHICH AREN'T VALID).
    -------------------------------------------------------------------------------------------------
    de_duped AS
    (
    SELECT a.rowid,
    a.word,
    a.operator
    FROM tidy_up_round2 a
    LEFT JOIN tidy_up_round2 b
    ON a.rowid = (b.rowid + 1) -- consecutive rowid's
    AND a.operator LIKE 'op%' -- only operators
    AND b.operator LIKE 'op%' -- only operators
    WHERE b.rowid IS NULL
    ),

    
and exclude any empty or unnecessary parentheses


    -------------------------------------------------------------------------------------------------
    -- GET PARENTHESIS GROUPS, AND THEN THE EMPTY/UNNECESSARY PAIRS (i.e. "()" or "(blah)")
    -------------------------------------------------------------------------------------------------
    parentheses_groups AS
    (
    SELECT ROW_NUMBER() OVER(ORDER BY rowid) AS prowid,
    rowid,
    word,
    operator
    FROM de_duped
    WHERE operator IN ('paren_o', 'paren_x')
    OR operator LIKE 'search%'
    ),
    empty_pairs AS
    (
    SELECT a.rowid AS rowid_o,
    b.rowid AS rowid_x
    FROM parentheses_groups a
    JOIN parentheses_groups b
    ON a.operator = 'paren_o' -- opening parenthesis
    AND b.operator = 'paren_x' -- closing parenthesis
    AND (
    a.prowid = (b.prowid + 1) -- where opening and closing are consecutive rows (i.e. empty)
    OR a.prowid = (b.prowid + 2) -- where opening and closing are 1 row apart (i.e. they surround a single search term - this is unnecessary)
    )
    ),
    -------------------------------------------------------------------------------------------------
    -- NOW EXCLUDE EMPTY PARENTHESES GROUPS FROM THE FINAL RESULT SET
    -------------------------------------------------------------------------------------------------
    no_empties AS
    (
    SELECT a.rowid,
    a.word,
    a.operator
    FROM de_duped a
    LEFT JOIN empty_pairs x
    ON (a.rowid = x.rowid_o OR a.rowid = x.rowid_x)
    WHERE x.rowid_o IS NULL
    ),

    And lastly, format any wildcard (i.e. prefix) search terms, and insert the results into a table variable:

    -------------------------------------------------------------------------------------------------
    -- WILDCARD HANDLER
    -------------------------------------------------------------------------------------------------
    wildcard_handler AS
    (
    SELECT ROW_NUMBER() OVER(ORDER BY rowid) AS rowid,
    CASE operator
    WHEN 'searchprefix' THEN
    '"' + CASE
    WHEN (LEFT(word, 1) <> '*') AND (RIGHT(word, 1) <> '*') THEN LEFT(word, CHARINDEX('*', word)-1)
    ELSE REPLACE(word, '*', '')
    END + '*"'
    ELSE word
    END AS word,
    operator
    FROM no_empties
    )
    -------------------------------------------------------------------------------------------------
    -- FINALLY, INSERT THE RESULTS INTO A TABLE VARIABLE
    -------------------------------------------------------------------------------------------------
    INSERT @tt_searchwords (rowid, word, operator)
    SELECT rowid, word, operator
    FROM wildcard_handler;

    Phew. Enough for this post I think.

    So, at this point if I pass in the following search string – sql NEAR server -(oracle | mysql)then my table variable should be populated with the following:

    Table variable contents

    In the next (and final) post, I’ll wrap up the “near” operator & double-quote handlers, and the building of the final output string.

    See ya!

    DB Dave

  • Full-Text Search Operators – Part 2: Replacing alternate operators

    Photo by natashalcdIn this post I’m going to deal with another piece of the puzzle that I described in part 1. Now, after re-reading that post it became obvious that I probably need to outline what the end result is meant to look like:

    We want to be able to pass in a search string, and get back a nicely formatted full-text search clause which makes full use of Boolean operators, respects parentheses, and supports wildcards. Easy, right? Yup, and we need to do it all using only T/SQL.

    Here’s the steps that make up the final function (in a nutshell):

    1. Remove some dodgy characters (vertical-tab, form-feed, etc)
    2. Tidy up any parentheses [part 1]
    3. Replace any alternate operators with place-holders (most operators have 2 representations; e.g. OR and |, AND and &, NOT and –, etc).
    4. Break the search term into individual words (i.e. split the string on space characters)
    5. Using this list of individual words, do the following:
      1. Classify each word as an operator, search term, parenthesis, wildcard, etc.
      2. Tidy up the list, removing illegal and unnecessary characters
      3. Remove invalid operators (e.g. “ipod NOT” is not a valid search term)
      4. Remove unnecessary parentheses (e.g. single words wrapped in parentheses)
      5. Handle wildcards. Since full-text only accepts prefix searches, change all invalid wildcard searches into prefix searches.
    6. Check for single and double-quotes (& tidy up any invalid quote-marks, make sure only valid pairs remain, etc.).
    7. Process any NEAR(~) operators.
    8. Put the final Full-Text command string together.

    Right, so we’ve already taken care of number 2 in the previous post, so now on to number 3 – replacing the alternate operators.  This step may not be necessary for you, since it’s sole purpose is to preserve the original search string for our reporting purposes.

    You see, if I search for “ipad OR iphone NOT ipod” and you search for “ipad | iphone –ipod”, then obviously since they’re really the same search we will get the same results (and our cache should also cache them as 1 search), but for reporting purposes we want to know that our searches were typed in differently.  The way we do that in our implementation of this function, is that we return a normalised search string used to perform & cache the search, and an original string used for reporting.

    Here’s the very straight forward code:

    IF OBJECT_ID('[dbo].[fn_search_clause_operator_replacer]') IS NULL
    EXEC('CREATE FUNCTION [dbo].[fn_search_clause_operator_replacer] () RETURNS INT AS BEGIN RETURN(0); END ');
    GO
    /***************************************************************************************************************************
    This is a supporting function, which is called by the function [fn_search_clause_get].
    It accepts a string input, and outputs the same string, but with any character operators replaced with
    placeholder text-operators instead.
    Author : David Curlewis
    Date : 07/2009
    
    This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License
    http://creativecommons.org/licenses/by-nc/3.0/
    ****************************************************************************************************************************/
    ALTER FUNCTION [dbo].[fn_search_clause_operator_replacer] (
    @search_text NVARCHAR(4000)
    ) RETURNS NVARCHAR(4000)
    AS
    BEGIN
    DECLARE @not_pattern VARCHAR(50),
    @not_pos INT;
    
    SET @search_text = REPLACE(REPLACE(REPLACE(@search_text,
    '|', ' `OR` '), -- OR
    '~', ' `NEAR` '), -- NEAR
    '&', ' `AND` '); -- AND
    
    -- the NOT operator requires some extra work because we only consider it an operator if it is not hyphenating a word
    SET @not_pattern = '%[^a-z0-9][-][ a-z0-9"''(]%'; -- valid NOT operator pattern
    SET @not_pos = ISNULL((SELECT PATINDEX(@not_pattern, @search_text)), 0); -- get the positions of any valid "-" operators
    WHILE (@not_pos > 0)
    BEGIN
    SET @search_text = STUFF(@search_text, @not_pos, 2, ' `NOT` ');
    SET @not_pos = ISNULL((SELECT PATINDEX(@not_pattern, @search_text)), 0); -- get pos of next operators (if any)
    END
    
    RETURN(@search_text);
    END
    GO

    The first bit is very simple; it’s just a REPLACE statement replacing each operator’s “symbol operator” (i.e. |, ~, &) with a bit of placeholder text (i.e. `OR`, `NEAR`, `AND`).

    The NOT operators require a little more work because we don’t want to interpret all hyphens as operators, otherwise you wouldn’t be able to search for hyphenated words. Using PATINDEX we can very easily match on a simple RegEx-like pattern: [^a-z0-9][-][ a-z0-9″”(]

    This looks for any occurrences of the hyphen, where it follows a non-alphanumeric character, and precedes a space, an alphanumeric character, a double or single quote, or an opening parenthesis. It then replaces each occurrence with the placeholder text `NOT`.

    In the next post I’ll go through the meatiest bit, which involves splitting the string (number 4), and preparing the resulting data (number 5).

    Cheers

    DB Dave

  • Removing accented and illegal characters from a string

    This post isn’t related to the series I’m currently writing on Full-Text search operators, but it is in the same ball-park so I thought I’d throw it up here.

    Today I was working on our search term auto-suggest system (i.e. the system that shows a list of suggestions in a drop-down list below the search box as you type your search term in), and needed to improve the way we strip out “illegal” characters from our dictionary terms.

    Illegal characters in our case refers to pretty much anything that isn’t alphanumeric, or a space, dollar sign, ampersand, hyphen, or period.  We don’t even want any accented characters (i.e. letters with diacritics) – although in this case we want to replace accented characters with their un-accented counterparts (e.g. Å should be replaced by A).

    The way I did this (and I’m sure there are many, many better ways) is to insert illegal characters into a table variable, along with a replacement character, and then update the input variable for each row in this table variable – replacing any matching illegal characters each time with it’s replacement value.

    IF OBJECT_ID('[dbo].[fn_clean_search_term]') IS NULL
    	EXEC('CREATE FUNCTION [dbo].[fn_clean_search_term] () RETURNS INT AS BEGIN RETURN(0); END ');
    GO
    /***************************************************************************************************************************
    This accepts a string input, and outputs the same string, but with any invalid characters removed, or replaced if they
    are accented characters. Note: not all accented characters are accounted for, only the most common ones in our data.
    Also, this does not handle Unicode characters.
    
    Author : David Curlewis
    Date : 07/2009
    
    This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License
    http://creativecommons.org/licenses/by-nc/3.0/
    ****************************************************************************************************************************/
    ALTER FUNCTION [dbo].[fn_clean_search_term] (
    	@keyword VARCHAR(2048)
    ) RETURNS VARCHAR(2048) WITH SCHEMABINDING
    AS
    BEGIN
    
    	-- Only do the heavy lifting if there are non-alphanumeric characters in the string
    	IF PATINDEX('%[^A-Za-z0-9 ]%', @keyword) > 0
    	BEGIN
    
    		-- if the string contains anything other than &quot;valid&quot; characters, strip them out
    		DECLARE @chars TABLE ([ascii] TINYINT NOT NULL, [new_char] VARCHAR(1));
    
    		;WITH illegal_chars AS
    		(
    			SELECT	0 AS [ascii]
    			UNION ALL
    			SELECT	[ascii] + 1
    			FROM	illegal_chars
    			WHERE	[ascii] < 255
    		)
    		INSERT	@chars ([ascii], [new_char])
    		SELECT	[ascii],
    				CASE -- Replace accented letters with non-accented letters to normalise the search terms
    				-- Unless your collation is case sensitive, the lower &amp; upper case variants are equivalent
    				WHEN CHAR([ascii]) IN ('ĂĄ','Ă ','Ăą','Ă€','Ă„') THEN 'a'
    				WHEN CHAR([ascii]) IN ('Á','Ã','Ä','Å','À','Â') THEN 'A'
    				WHEN CHAR([ascii]) IN ('ç') THEN 'c'
    				WHEN CHAR([ascii]) IN ('Ç') THEN 'C'
    				WHEN CHAR([ascii]) IN ('ĂȘ','Ă©','Ă«','Ăš') THEN 'e'
    				WHEN CHAR([ascii]) IN ('Ê','Ë','É','È') THEN 'E'
    				WHEN CHAR([ascii]) IN ('ĂŻ','Ă­','Ăź','ĂŹ') THEN 'i'
    				WHEN CHAR([ascii]) IN ('Í','Ì','Î','Ï') THEN 'I'
    				WHEN CHAR([ascii]) IN ('ñ') THEN 'n'
    				WHEN CHAR([ascii]) IN ('Ñ') THEN 'N'
    				WHEN CHAR([ascii]) IN ('ĂŽ','ö','ĂČ','Ă”','Ăł','Ăž') THEN 'o'
    				WHEN CHAR([ascii]) IN ('Ó','Ô','Õ','Ø','Ö','Ò') THEN 'O'
    				WHEN CHAR([ascii]) IN ('ĆĄ') THEN 's'
    				WHEN CHAR([ascii]) IN ('Ć ') THEN 'S'
    				WHEN CHAR([ascii]) IN ('Ăș','ĂŒ','Ă»','Ăč') THEN 'u'
    				WHEN CHAR([ascii]) IN ('Ù','Ú','Ü','Û') THEN 'U'
    				WHEN CHAR([ascii]) IN ('Ăż','Ăœ') THEN 'y'
    				WHEN CHAR([ascii]) IN ('ƾ','Ý') THEN 'Y'
    				WHEN CHAR([ascii]) IN ('ĆŸ') THEN 'z'
    				WHEN CHAR([ascii]) IN ('Ćœ') THEN 'Z'
    				ELSE '' -- remove the character
    				END AS [new_char]
    		FROM	illegal_chars
    		WHERE	[ascii] NOT BETWEEN 48 AND 57 -- 0 -> 9
    		AND		[ascii] NOT BETWEEN 65 AND 90 -- A -> Z
    		AND		[ascii] NOT BETWEEN 97 AND 122 -- a -> z
    		AND		[ascii] NOT IN ( -- explicitly allow the following characters
    					32, -- <space>
    					36, -- $
    					38, -- &amp;
    					45, -- -
    					46 -- .
    					)
    		OPTION (MAXRECURSION 255);
    
    		UPDATE @chars
    		SET @keyword = REPLACE(@keyword, CHAR([ascii]), [new_char])
    
    	END
    
    	-- remove multiple spaces
    	WHILE CHARINDEX(' ', @keyword) > 0
    	SELECT @keyword = REPLACE(@keyword, ' ', ' ');
    
    	SELECT @keyword = LTRIM(RTRIM(@keyword)); -- trim leading &amp; trailing whitespace
    	SELECT @keyword = LOWER(@keyword); -- return lower-case only
    
    	RETURN @keyword;
    END
    GO

    Let me know if you have any questions, or different/better ways of doing this (yes, I know I could do it using a couple of lines of .Net in a CLR function, but I don’t have CLR enabled on this server). Smile

    Cheers

    DB Dave

  • Full-Text Search Operators – Part 1: Cleaning up parentheses in a string

    BooleanBeards

    Part 1: Cleaning up parentheses in a string
    Part 2: Replacing alternate operators
    Part 3: Splitting the search string
    Part 4: Wrapping it up

    A few years ago I wanted to have a crack at improving one of our websites’ search by providing “search operator” functionality.  These are basically special words or characters (i.e. operators) which provide Boolean search capability; e.g. AND, OR, NOT.

    I looked around but couldn’t find anything implemented using T/SQL.  There were a few .Net solutions, but I’m not a developer, and can only handle very basic .Net on a good day.  So I had a crack at writing my own. I started off with the Google cheat-sheet as my initial list of requirements, which was basically as follows:

    1. Must support the operators AND, OR, NOT, and NEAR
    2. Must allow for “exact-phrase” searches
    3. Should respect parentheses (for precedence control)
    4. Should be able to perform wildcard searches
    5. Operators must be case-sensitive (i.e. OR != or)

    To attack this problem, I decided to split it into multiple chunks, and write separate blocks of code to meet each requirements.  In this post I’m going to introduce the first piece of the puzzle (which is number 3 from the list above; handling parentheses).  What this means is that given a string containing parentheses, I need to output the same string with valid parentheses intact (and obviously invalid ones removed).

    Below is the function. I won’t explain it line by line since its pretty well commented, and relatively simple anyway.

    IF OBJECT_ID('[dbo].[fn_search_clause_parentheses_handler]') IS NULL
    EXEC('CREATE FUNCTION [dbo].[fn_search_clause_parentheses_handler] () RETURNS INT AS BEGIN RETURN(0); END ');
    GO
    /***************************************************************************************************************************
    This is a supporting function, which is called by the function [fn_search_clause_get].
    It accepts a string input, and outputs the same string, but with any invalid parentheses removed, and all remaining
    valid parentheses buffered (by a space on each side) so that they are split correctly by the calling function.
    Author : David Curlewis
    Date : 07/2009
    
    This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License
    http://creativecommons.org/licenses/by-nc/3.0/
    ****************************************************************************************************************************/
    ALTER FUNCTION [dbo].[fn_search_clause_parentheses_handler] (
    @search_text NVARCHAR(4000)
    ) RETURNS NVARCHAR(4000)
    AS
    BEGIN
    -------------------------------------------------------------------------------------------------
    -- PARENTHESES HANDLER
    -------------------------------------------------------------------------------------------------
    -- IF parentheses exist in the string
    IF (PATINDEX('%[()]%', @search_text) > 0)
    BEGIN
    DECLARE @pos_o INT, @pos_x INT, @charpos INT;
    DECLARE @tt_char_split TABLE (id SMALLINT IDENTITY(1,1), [char] VARCHAR(2) NOT NULL);
    SET @charpos = 1;
    
    -- split the string apart into a temp table
    WHILE @charpos <= LEN(@search_text)
    BEGIN
    INSERT @tt_char_split ([char])
    SELECT SUBSTRING(@search_text, @charpos, 1);
    
    SET @charpos = @charpos + 1;
    END
    
    -- while we have opening and closing parentheses
    WHILE EXISTS(SELECT TOP 1 1 FROM @tt_char_split WHERE [char] = '(')
    AND EXISTS(SELECT TOP 1 1 FROM @tt_char_split WHERE [char] = ')')
    BEGIN
    -- Get the position of the first closing parenthesis
    SET @pos_x = ( SELECT MIN(id)
    FROM @tt_char_split
    WHERE [char] = ')');
    
    -- Get the position of the first opening parenthesis
    SET @pos_o = ( SELECT MAX(id)
    FROM @tt_char_split
    WHERE [char] = '('
    AND id < @pos_x);
    
    -- there is a valid pair of parentheses
    IF (@pos_o IS NOT NULL AND @pos_x IS NOT NULL)
    BEGIN
    -- Escape this pair so we know they've been processed
    UPDATE @tt_char_split
    SET [char] = '(('
    WHERE id = @pos_o;
    
    UPDATE @tt_char_split
    SET [char] = '))'
    WHERE id = @pos_x;
    END
    ELSE BEGIN -- there is not a valid pair of parentheses
    UPDATE @tt_char_split
    SET [char] = ''
    WHERE id IN (@pos_o, @pos_x);
    END
    END
    
    -- remove any remaining (invalid) parentheses
    UPDATE @tt_char_split
    SET [char] = ''
    WHERE [char] IN ('(', ')');
    
    SET @search_text = '';
    
    -- build new search string
    SELECT @search_text = @search_text + REPLACE(REPLACE([char], '((', ' ( '), '))', ' ) ')
    FROM @tt_char_split
    ORDER BY id;
    
    -- remove empty pairs, i.e. " ( ) "
    WHILE CHARINDEX(' ( ) ', @search_text) > 0
    SET @search_text = REPLACE(@search_text, ' ( ) ', '');
    END
    
    RETURN(@search_text);
    END
    GO

    So, its a scalar function which accepts a string, and returns that same string, but with it’s parentheses tidied up and “buffered” (i.e. I wrap them in spaces to make it easier for the calling function to then split the string into its component parts).

    Now, as I sit here watching the latest straight-to-dvd masterpiece that is “Barbie: A fashion fairy-tale” (hey, its a Sunday afternoon and I have 3 daughters – you’d lose the battle too!), I’m having a hard time getting the code to post properly, so if you have any issues, flick me a note in the comments below and I’ll sort something else out.

    Give it a try and let me know if you find any bugs (or make any improvements!). I’ll follow up soon with posts describing the rest of the process, but for now I’ve had enough of Barbie, and think I deserve a whisky. Winking smile

    Cheers

    DB Dave

  • Generate scripts to detach/attach all databases

    I haven’t had time to do any Denali testing since my last post, so instead I thought I’d share a script I wrote a while ago which simply generates a script to attach one or more databases using their current file paths (and optionally then detaches these databases).

    The reason I wrote this script in the first place was because we have a number of environment (i.e. dev, test, stage, reporting, etc), and some of these databases are made up of quite a few database files (i.e. more than a dozen or so).
    So on the odd occasion when I’ve needed to detach a database, move some files around, and re-attach said database using the Management Studio GUI, I’ve found it frustrating having to manually browse to each file’s location over and over again. And I just prefer doing stuff in T/SQL over using the GUI – makes me feel more in control I think.

    So, with this script I could easily generate the “attach” code for each database (including all file paths), make any path or filename changes necessary, and re-attach all the databases with a single mouse click!

    USE [master];
    
    DECLARE @database NVARCHAR(200),
    @cmd NVARCHAR(1000),
    @attach_cmd NVARCHAR(4000),
    @file NVARCHAR(1000),
    @i INT;
    
    DECLARE dbname_cur CURSOR STATIC LOCAL FORWARD_ONLY FOR
    SELECT  RTRIM(LTRIM([name]))
    FROM    sysdatabases
    WHERE   [dbid] &gt; 4  -- exclude system databases
    
    OPEN dbname_cur
    FETCH NEXT FROM dbname_cur INTO @database
    WHILE @@FETCH_STATUS = 0
    BEGIN
    SELECT @i = 1;
    
    -- Initial attach command stub
    SET @attach_cmd = '-- ' + QUOTENAME(@database) + CHAR(10)
    + 'EXEC sp_attach_db @dbname = ''' + @database + '''' + CHAR(10);
    
    -- Get a list of files for this database
    DECLARE dbfiles_cur CURSOR STATIC LOCAL FORWARD_ONLY FOR
    SELECT  physical_name
    FROM    sys.master_files
    WHERE   database_id = db_id(@database)
    ORDER   BY [file_id];
    
    OPEN dbfiles_cur
    FETCH NEXT FROM dbfiles_cur INTO @file
    WHILE @@FETCH_STATUS = 0
    BEGIN
    SET @attach_cmd = @attach_cmd + '    ,@filename' + CAST(@i AS NVARCHAR(10)) + ' = ''' + @file + '''' + CHAR(10);
    SET @i = @i + 1;
    FETCH NEXT FROM dbfiles_cur INTO @file
    END
    
    CLOSE dbfiles_cur;
    DEALLOCATE dbfiles_cur;
    
    -- Output "attach" command.
    PRINT @attach_cmd;
    
    -- DETACH (uncomment the following line at your peril - IT WILL DETACH ALL USER DATABASES!)
    /*EXEC sp_detach_db @dbname = @database, @skipchecks = 'true';  -- change this to false if you want it to update stats before detaching*/
    
    FETCH NEXT FROM dbname_cur INTO @database
    END
    
    CLOSE dbname_cur;
    DEALLOCATE dbname_cur;

    The usual disclaimers apply; I’ve used this in a limited number of scenarios so I wouldn’t consider it thoroughly tested, and if you run it in a production environment and “accidentally” detach databases, or otherwise break something – don’t blame me.

    Cheers
    Dave