Tag: sql

  • Full-Text Search Operators – Part 1: Cleaning up parentheses in a string

    BooleanBeards

    Part 1: Cleaning up parentheses in a string
    Part 2: Replacing alternate operators
    Part 3: Splitting the search string
    Part 4: Wrapping it up

    A few years ago I wanted to have a crack at improving one of our websites’ search by providing “search operator” functionality.  These are basically special words or characters (i.e. operators) which provide Boolean search capability; e.g. AND, OR, NOT.

    I looked around but couldn’t find anything implemented using T/SQL.  There were a few .Net solutions, but I’m not a developer, and can only handle very basic .Net on a good day.  So I had a crack at writing my own. I started off with the Google cheat-sheet as my initial list of requirements, which was basically as follows:

    1. Must support the operators AND, OR, NOT, and NEAR
    2. Must allow for “exact-phrase” searches
    3. Should respect parentheses (for precedence control)
    4. Should be able to perform wildcard searches
    5. Operators must be case-sensitive (i.e. OR != or)

    To attack this problem, I decided to split it into multiple chunks, and write separate blocks of code to meet each requirements.  In this post I’m going to introduce the first piece of the puzzle (which is number 3 from the list above; handling parentheses).  What this means is that given a string containing parentheses, I need to output the same string with valid parentheses intact (and obviously invalid ones removed).

    Below is the function. I won’t explain it line by line since its pretty well commented, and relatively simple anyway.

    IF OBJECT_ID('[dbo].[fn_search_clause_parentheses_handler]') IS NULL
    EXEC('CREATE FUNCTION [dbo].[fn_search_clause_parentheses_handler] () RETURNS INT AS BEGIN RETURN(0); END ');
    GO
    /***************************************************************************************************************************
    This is a supporting function, which is called by the function [fn_search_clause_get].
    It accepts a string input, and outputs the same string, but with any invalid parentheses removed, and all remaining
    valid parentheses buffered (by a space on each side) so that they are split correctly by the calling function.
    Author : David Curlewis
    Date : 07/2009
    
    This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License
    http://creativecommons.org/licenses/by-nc/3.0/
    ****************************************************************************************************************************/
    ALTER FUNCTION [dbo].[fn_search_clause_parentheses_handler] (
    @search_text NVARCHAR(4000)
    ) RETURNS NVARCHAR(4000)
    AS
    BEGIN
    -------------------------------------------------------------------------------------------------
    -- PARENTHESES HANDLER
    -------------------------------------------------------------------------------------------------
    -- IF parentheses exist in the string
    IF (PATINDEX('%[()]%', @search_text) > 0)
    BEGIN
    DECLARE @pos_o INT, @pos_x INT, @charpos INT;
    DECLARE @tt_char_split TABLE (id SMALLINT IDENTITY(1,1), [char] VARCHAR(2) NOT NULL);
    SET @charpos = 1;
    
    -- split the string apart into a temp table
    WHILE @charpos <= LEN(@search_text)
    BEGIN
    INSERT @tt_char_split ([char])
    SELECT SUBSTRING(@search_text, @charpos, 1);
    
    SET @charpos = @charpos + 1;
    END
    
    -- while we have opening and closing parentheses
    WHILE EXISTS(SELECT TOP 1 1 FROM @tt_char_split WHERE [char] = '(')
    AND EXISTS(SELECT TOP 1 1 FROM @tt_char_split WHERE [char] = ')')
    BEGIN
    -- Get the position of the first closing parenthesis
    SET @pos_x = ( SELECT MIN(id)
    FROM @tt_char_split
    WHERE [char] = ')');
    
    -- Get the position of the first opening parenthesis
    SET @pos_o = ( SELECT MAX(id)
    FROM @tt_char_split
    WHERE [char] = '('
    AND id < @pos_x);
    
    -- there is a valid pair of parentheses
    IF (@pos_o IS NOT NULL AND @pos_x IS NOT NULL)
    BEGIN
    -- Escape this pair so we know they've been processed
    UPDATE @tt_char_split
    SET [char] = '(('
    WHERE id = @pos_o;
    
    UPDATE @tt_char_split
    SET [char] = '))'
    WHERE id = @pos_x;
    END
    ELSE BEGIN -- there is not a valid pair of parentheses
    UPDATE @tt_char_split
    SET [char] = ''
    WHERE id IN (@pos_o, @pos_x);
    END
    END
    
    -- remove any remaining (invalid) parentheses
    UPDATE @tt_char_split
    SET [char] = ''
    WHERE [char] IN ('(', ')');
    
    SET @search_text = '';
    
    -- build new search string
    SELECT @search_text = @search_text + REPLACE(REPLACE([char], '((', ' ( '), '))', ' ) ')
    FROM @tt_char_split
    ORDER BY id;
    
    -- remove empty pairs, i.e. " ( ) "
    WHILE CHARINDEX(' ( ) ', @search_text) > 0
    SET @search_text = REPLACE(@search_text, ' ( ) ', '');
    END
    
    RETURN(@search_text);
    END
    GO

    So, its a scalar function which accepts a string, and returns that same string, but with it’s parentheses tidied up and “buffered” (i.e. I wrap them in spaces to make it easier for the calling function to then split the string into its component parts).

    Now, as I sit here watching the latest straight-to-dvd masterpiece that is “Barbie: A fashion fairy-tale(hey, its a Sunday afternoon and I have 3 daughters – you’d lose the battle too!), I’m having a hard time getting the code to post properly, so if you have any issues, flick me a note in the comments below and I’ll sort something else out.

    Give it a try and let me know if you find any bugs (or make any improvements!). I’ll follow up soon with posts describing the rest of the process, but for now I’ve had enough of Barbie, and think I deserve a whisky. Winking smile

    Cheers

    DB Dave

  • Generate scripts to detach/attach all databases

    I haven’t had time to do any Denali testing since my last post, so instead I thought I’d share a script I wrote a while ago which simply generates a script to attach one or more databases using their current file paths (and optionally then detaches these databases).

    The reason I wrote this script in the first place was because we have a number of environment (i.e. dev, test, stage, reporting, etc), and some of these databases are made up of quite a few database files (i.e. more than a dozen or so).
    So on the odd occasion when I’ve needed to detach a database, move some files around, and re-attach said database using the Management Studio GUI, I’ve found it frustrating having to manually browse to each file’s location over and over again. And I just prefer doing stuff in T/SQL over using the GUI – makes me feel more in control I think.

    So, with this script I could easily generate the “attach” code for each database (including all file paths), make any path or filename changes necessary, and re-attach all the databases with a single mouse click!

    USE [master];
    
    DECLARE @database NVARCHAR(200),
    @cmd NVARCHAR(1000),
    @attach_cmd NVARCHAR(4000),
    @file NVARCHAR(1000),
    @i INT;
    
    DECLARE dbname_cur CURSOR STATIC LOCAL FORWARD_ONLY FOR
    SELECT  RTRIM(LTRIM([name]))
    FROM    sysdatabases
    WHERE   [dbid] &gt; 4  -- exclude system databases
    
    OPEN dbname_cur
    FETCH NEXT FROM dbname_cur INTO @database
    WHILE @@FETCH_STATUS = 0
    BEGIN
    SELECT @i = 1;
    
    -- Initial attach command stub
    SET @attach_cmd = '-- ' + QUOTENAME(@database) + CHAR(10)
    + 'EXEC sp_attach_db @dbname = ''' + @database + '''' + CHAR(10);
    
    -- Get a list of files for this database
    DECLARE dbfiles_cur CURSOR STATIC LOCAL FORWARD_ONLY FOR
    SELECT  physical_name
    FROM    sys.master_files
    WHERE   database_id = db_id(@database)
    ORDER   BY [file_id];
    
    OPEN dbfiles_cur
    FETCH NEXT FROM dbfiles_cur INTO @file
    WHILE @@FETCH_STATUS = 0
    BEGIN
    SET @attach_cmd = @attach_cmd + '    ,@filename' + CAST(@i AS NVARCHAR(10)) + ' = ''' + @file + '''' + CHAR(10);
    SET @i = @i + 1;
    FETCH NEXT FROM dbfiles_cur INTO @file
    END
    
    CLOSE dbfiles_cur;
    DEALLOCATE dbfiles_cur;
    
    -- Output "attach" command.
    PRINT @attach_cmd;
    
    -- DETACH (uncomment the following line at your peril - IT WILL DETACH ALL USER DATABASES!)
    /*EXEC sp_detach_db @dbname = @database, @skipchecks = 'true';  -- change this to false if you want it to update stats before detaching*/
    
    FETCH NEXT FROM dbname_cur INTO @database
    END
    
    CLOSE dbname_cur;
    DEALLOCATE dbname_cur;

    The usual disclaimers apply; I’ve used this in a limited number of scenarios so I wouldn’t consider it thoroughly tested, and if you run it in a production environment and “accidentally” detach databases, or otherwise break something – don’t blame me.

    Cheers
    Dave

  • Identifying unique indexes that aren’t defined as unique

    Confused? So am I, but after changing the title of this post about a dozen times this is what we’re left with… 🙂

    I was in the process of examining some indexes on our test Denali server (see previous posts) when I decided to check how many of our non-unique, non-clustered indexes are actually made up of a unique column (or a unique combination of columns in the case of a compound index). In other words, which non-clustered indexes could I convert to UNIQUE non-clustered indexes? I whipped up a script to do this and thought I’d share.

    If you’re wondering why I might want to do this; I’m interested to see what impact it has on index storage, as well as read performance. I’m by no means saying you should go and recreate all of your NC indexes as unique – remember that this script will return all indexes where the *current* data in the table is distinct. There are no guarantees that future inserts & updates won’t cause the data to become non-unique (and break your application).

    Here’s the script. It’s pretty straight forward, and simply returns a list of tables/indexes at the moment, but could easily be change to actually output the CREATE / DROP_EXISTING commands instead.

    SET NOCOUNT ON;
    DECLARE @i INT = 1,
    		@rc INT,
    		@cmd NVARCHAR(2000),
    		@tablename NVARCHAR(128),
    		@indexname NVARCHAR(128),
    		@collist NVARCHAR(1000),
    		@rowcount INT,
    		@rowcount_d INT;
    
    DECLARE @tt_ncidxs TABLE (rowid INT IDENTITY, [object_id] INT, [object_name] VARCHAR(128), [index_id] INT, index_name VARCHAR(128), [index_columns] VARCHAR(1000), is_unique BIT);
    
    WITH IndexColumns AS
    (
    	SELECT	DISTINCT
    			i.[object_id],
    			QUOTENAME(OBJECT_NAME(i.[object_id])) AS [object_name],
    			i.[index_id],
    			QUOTENAME(i.[name]) AS [index_name],
    			STUFF(
    					(
    					SELECT	', ' + QUOTENAME(COL_NAME([object_id], column_id)) AS 'data()'
    					FROM	sys.index_columns AS t2
    					WHERE	c.[object_id] = t2.[object_id]
    					AND		c.index_id = t2.index_id
    					FOR		XML PATH ('')
    					), 1, 2, ''
    				) AS 'index_columns'
    	FROM	sys.indexes AS i
    	JOIN	sys.index_columns AS c ON c.[index_id] = i.[index_id] AND c.[object_id] = i.[object_id]
    	JOIN	sys.objects AS so ON i.[object_id] = so.[object_id]
    	WHERE	i.is_unique = 0
    	AND		i.is_unique_constraint = 0
    	AND		so.is_ms_shipped = 0
    	AND		so.[type] = 'U'
    )
    INSERT	@tt_ncidxs ([object_id], [object_name], [index_id], [index_name], [index_columns])
    SELECT	[object_id],
    		[object_name],
    		[index_id],
    		[index_name],
    		[index_columns]
    FROM	IndexColumns
    
    --SELECT * FROM @tt_ncidxs
    SELECT @rc = @@ROWCOUNT;
    
    WHILE @i <= @rc
    BEGIN
    	SELECT @cmd = NULL, @rowcount = NULL, @rowcount_d = NULL, @tablename = NULL, @collist = NULL;
    
    	SELECT	@tablename = [object_name],
    			@indexname = index_name,
    			@collist = [index_columns]
    	FROM	@tt_ncidxs
    	WHERE	rowid = @i;
    
    	RAISERROR('Table %s column list: %s', 10, 1, @tablename, @collist) WITH NOWAIT;
    
    	SET @cmd = 'SELECT @x_rowcount = COUNT(*) FROM ' + @tablename + ' WITH (NOLOCK);'
    	EXEC sp_executesql @cmd, N'@x_rowcount INT OUTPUT', @x_rowcount = @rowcount OUTPUT;
    
    	SET @cmd = 'SELECT @x_rowcount_d = COUNT(*) FROM (SELECT DISTINCT ' + @collist + ' FROM ' + @tablename + ' WITH (NOLOCK) ) x;'
    	EXEC sp_executesql @cmd, N'@x_rowcount_d INT OUTPUT', @x_rowcount_d = @rowcount_d OUTPUT;
    
    	RAISERROR('Index %s on table %s selectivity = %i/%i', 10, 1, @indexname, @tablename, @rowcount_d, @rowcount) WITH NOWAIT;
    
    	IF (ISNULL(@rowcount, 1) = ISNULL(@rowcount_d, 2)) -- i.e. column values are (currently) distinct/unique
    	UPDATE @tt_ncidxs
    	SET is_unique = 1
    	WHERE rowid = @i;
    
    	SET @i += 1;
    END
    
    -- Get a list of non-unique nonclustered indexes whose index columns are currently distinct.
    -- These *may* be candidates for conversion to UNIQUE nonclustered indexes instead.
    -- NB: PLEASE MAKE 100% SURE THAT THE COLUMNS ARE ALWAYS GOING TO BE UNIQUE BEFORE CREATING A UNIQUE INDEX
    SELECT	*
    FROM	@tt_ncidxs
    WHERE	is_unique = 1
    ORDER	BY [object_name], [index_id]

    Cheers
    DB Dave

  • SQL Breakfast

    This morning we (i.e. my team and I) tried something we’ve spoken about for some time, but haven’t got around to doing until now; a “Tech Breakfast”.

    Yummy stuffSo I stopped at a bakery on the way to work to pick up some yummy baked goods, and all 3 of us piled into a meeting room at 07:30 with our individual beverage of choice in hand, and watched Kimberly Tripp’sIndexing Internals” MCM presentation and demo.

    It went down pretty well, so I think we’ll make it a regular thing (maybe every 2nd Friday morning).  If you work in a team, and feel like mixing things up a little, give something like this a try.  At the very least it exposes people to things they might not have otherwise learnt themselves, and also sparks discussion in the team about the material being viewed, and how it may be relevant in their own environments or projects.

    All in all not a bad way to kick off a Friday. Smile

    Cheers,
    DB Dave

  • Our SQL Server “Denali” experiences – Part 2

    Where has the week gone? I finally got round to continuing my SQL Denali testing today – with very promising results in fact!

    To give you an idea of the environment (which I brushed over in the previous post) and a little more context, we have a number of SQL servers which sit between our web servers, and our main core databases. These act as “read-only” search, browse and caching servers for the Trade Me website. They each hold the same data, have the same configuration, and run on the same hardware, which makes it relatively easy to add new servers (i.e. scale horizontally) when necessary.

    Add to the mix some clever networking kit, and a few in-house developed bits & pieces, and we can finely control the amount (and type) of load on each individual server. Very handy for testing new hardware and software, service packs, etc.

    Wow! This is pretty awesome!So, just after lunch time this afternoon, I very gingerly put 1% search load on the Denali test box. To cut a long story short, over the course of the next few hours I increased load until we were running 50% of all search load on the 1 Denali test server. Trust me when I say that this is pretty damn awesome. 🙂

    The Denali test server is exactly the same hardware-wise as the existing SQL 2005 servers. We’ve tried on a few occasions to upgrade to SQL 2008, but a pretty well documented full-text related performance bottleneck meant we could never get more than 5-10% search load on a single box. The issue was with the full-text merge process causing excessive locking on the underlying indexed table – which in our environment is constantly changing, as well as being constantly queried – this meant we’ve just stuck with 2005 thus far.

    Although everything went very well, there were a few weird things I ran into while getting the server ready; firstly I needed to shrink a tempdb data file that had blown out, but the shrinkfile operation never completed (and actually timed out once or twice after around 5 minutes which I’ve never come across before). Because I couldn’t shrink the data file I thought I’d just restart the SQL service, which would recreate the tempdb at its original (i.e. smaller) size, but this wasn’t the case. The initial size was correctly set, but after a restart of the SQL service the tempdb file size was unchanged. I’ll need to investigate this further next week, so will post any new findings.

    That’s all for this week’s testing.
    Cheers
    DB Dave

  • Snapshot isolation and tempdb growth

    Over the weekend we were alerted to a server running low on disk space. Not particularly exciting in itself, but the fact that it was happening on a relatively newly build SQL 2008 database server with a small database (~2.5GB) was interesting.

    This particular database belongs to one of our newer and smaller websites, and has READ_COMMITTED_SNAPSHOT enabled (we did this to prevent excessive blocking, which we ran into and weren’t able to control many other ways since this site is built using an ORM which generates the SQL commands on the fly).

    Anyway, getting back to the point; disk space was low because the tempdb data file was using around 15GB of space. I assumed that someone was explicitly  doing something to cause the blowout (index rebuilds, gnarly query, etc.) but I couldn’t find any culprits straight away.

    I then checked the sys.dm_tran_session_transactions DMV (after reading through one of Raj’s posts) which showed one particularly long running session (over 14 days in fact!). There were no open transactions that I could see, and the session itself had been suspended for practically all of those 14 days, so the reason I hadn’t seen it earlier was that most of my queries were inner-joining to sys.dm_tran_session_transactions which didn’t have a corresponding row. The query had a shared schema lock on 2 tables though, which is why the version store wasn’t flushing I guess.

    I had thought that the version store was only maintained while there was an open transaction though – so either there was an open transaction and I simply didn’t see it (I ran “DBCC OPENTRAN” with no results) or it was enough that the session held schema locks. Anyone have any ideas on this?

    The last command executed by the SPID was a relatively common & trivial one from the website, so it looks like it was an orphaned connection from the application – although how that happened we’re not sure. I killed the SPID and was then able to shrink the tempdb down to its usual level.

    I haven’t had much exposure to snapshot isolation, but at least now know to check for this if we have problems with tempdb growth in future. 🙂

    Cheers
    DB Dave