Tag: sql

Full-Text Search Operators – Part 1: Cleaning up parentheses in a string

Part 1: Cleaning up parentheses in a string
Part 2: Replacing alternate operators
Part 3: Splitting the search string
Part 4: Wrapping it up

A few years ago I wanted to have a crack at improving one of our websites’ search by providing “search operator” functionality. These are basically special words or characters (i.e. operators) which provide Boolean search capability; e.g. AND, OR, NOT.

I looked around but couldn’t find anything implemented using T/SQL. There were a few .Net solutions, but I’m not a developer, and can only handle very basic .Net on a good day. So I had a crack at writing my own. I started off with the Google cheat-sheet as my initial list of requirements, which was basically as follows:

Must support the operators AND, OR, NOT, and NEAR
Must allow for “exact-phrase” searches
Should respect parentheses (for precedence control)
Should be able to perform wildcard searches
Operators must be case-sensitive (i.e. OR != or)

To attack this problem, I decided to split it into multiple chunks, and write separate blocks of code to meet each requirements. In this post I’m going to introduce the first piece of the puzzle (which is number 3 from the list above; handling parentheses). What this means is that given a string containing parentheses, I need to output the same string with valid parentheses intact (and obviously invalid ones removed).

Below is the function. I won’t explain it line by line since its pretty well commented, and relatively simple anyway.

IF OBJECT_ID('[dbo].[fn_search_clause_parentheses_handler]') IS NULL
EXEC('CREATE FUNCTION [dbo].[fn_search_clause_parentheses_handler] () RETURNS INT AS BEGIN RETURN(0); END ');
GO
/***************************************************************************************************************************
This is a supporting function, which is called by the function [fn_search_clause_get].
It accepts a string input, and outputs the same string, but with any invalid parentheses removed, and all remaining
valid parentheses buffered (by a space on each side) so that they are split correctly by the calling function.
Author : David Curlewis
Date : 07/2009

This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License
http://creativecommons.org/licenses/by-nc/3.0/
****************************************************************************************************************************/
ALTER FUNCTION [dbo].[fn_search_clause_parentheses_handler] (
@search_text NVARCHAR(4000)
) RETURNS NVARCHAR(4000)
AS
BEGIN
-------------------------------------------------------------------------------------------------
-- PARENTHESES HANDLER
-------------------------------------------------------------------------------------------------
-- IF parentheses exist in the string
IF (PATINDEX('%[()]%', @search_text) > 0)
BEGIN
DECLARE @pos_o INT, @pos_x INT, @charpos INT;
DECLARE @tt_char_split TABLE (id SMALLINT IDENTITY(1,1), [char] VARCHAR(2) NOT NULL);
SET @charpos = 1;

-- split the string apart into a temp table
WHILE @charpos <= LEN(@search_text)
BEGIN
INSERT @tt_char_split ([char])
SELECT SUBSTRING(@search_text, @charpos, 1);

SET @charpos = @charpos + 1;
END

-- while we have opening and closing parentheses
WHILE EXISTS(SELECT TOP 1 1 FROM @tt_char_split WHERE [char] = '(')
AND EXISTS(SELECT TOP 1 1 FROM @tt_char_split WHERE [char] = ')')
BEGIN
-- Get the position of the first closing parenthesis
SET @pos_x = ( SELECT MIN(id)
FROM @tt_char_split
WHERE [char] = ')');

-- Get the position of the first opening parenthesis
SET @pos_o = ( SELECT MAX(id)
FROM @tt_char_split
WHERE [char] = '('
AND id < @pos_x);

-- there is a valid pair of parentheses
IF (@pos_o IS NOT NULL AND @pos_x IS NOT NULL)
BEGIN
-- Escape this pair so we know they've been processed
UPDATE @tt_char_split
SET [char] = '(('
WHERE id = @pos_o;

UPDATE @tt_char_split
SET [char] = '))'
WHERE id = @pos_x;
END
ELSE BEGIN -- there is not a valid pair of parentheses
UPDATE @tt_char_split
SET [char] = ''
WHERE id IN (@pos_o, @pos_x);
END
END

-- remove any remaining (invalid) parentheses
UPDATE @tt_char_split
SET [char] = ''
WHERE [char] IN ('(', ')');

SET @search_text = '';

-- build new search string
SELECT @search_text = @search_text + REPLACE(REPLACE([char], '((', ' ( '), '))', ' ) ')
FROM @tt_char_split
ORDER BY id;

-- remove empty pairs, i.e. " ( ) "
WHILE CHARINDEX(' ( ) ', @search_text) > 0
SET @search_text = REPLACE(@search_text, ' ( ) ', '');
END

RETURN(@search_text);
END
GO

So, its a scalar function which accepts a string, and returns that same string, but with it’s parentheses tidied up and “buffered” (i.e. I wrap them in spaces to make it easier for the calling function to then split the string into its component parts).

Now, as I sit here watching the latest straight-to-dvd masterpiece that is “Barbie: A fashion fairy-tale” (hey, its a Sunday afternoon and I have 3 daughters – you’d lose the battle too!), I’m having a hard time getting the code to post properly, so if you have any issues, flick me a note in the comments below and I’ll sort something else out.

Give it a try and let me know if you find any bugs (or make any improvements!). I’ll follow up soon with posts describing the rest of the process, but for now I’ve had enough of Barbie, and think I deserve a whisky.

Cheers

DB Dave

2011-05-29

Generate scripts to detach/attach all databases

I haven’t had time to do any Denali testing since my last post, so instead I thought I’d share a script I wrote a while ago which simply generates a script to attach one or more databases using their current file paths (and optionally then detaches these databases).

The reason I wrote this script in the first place was because we have a number of environment (i.e. dev, test, stage, reporting, etc), and some of these databases are made up of quite a few database files (i.e. more than a dozen or so).
So on the odd occasion when I’ve needed to detach a database, move some files around, and re-attach said database using the Management Studio GUI, I’ve found it frustrating having to manually browse to each file’s location over and over again. And I just prefer doing stuff in T/SQL over using the GUI – makes me feel more in control I think.

So, with this script I could easily generate the “attach” code for each database (including all file paths), make any path or filename changes necessary, and re-attach all the databases with a single mouse click!

USE [master];

DECLARE @database NVARCHAR(200),
@cmd NVARCHAR(1000),
@attach_cmd NVARCHAR(4000),
@file NVARCHAR(1000),
@i INT;

DECLARE dbname_cur CURSOR STATIC LOCAL FORWARD_ONLY FOR
SELECT  RTRIM(LTRIM([name]))
FROM    sysdatabases
WHERE   [dbid] &gt; 4  -- exclude system databases

OPEN dbname_cur
FETCH NEXT FROM dbname_cur INTO @database
WHILE @@FETCH_STATUS = 0
BEGIN
SELECT @i = 1;

-- Initial attach command stub
SET @attach_cmd = '-- ' + QUOTENAME(@database) + CHAR(10)
+ 'EXEC sp_attach_db @dbname = ''' + @database + '''' + CHAR(10);

-- Get a list of files for this database
DECLARE dbfiles_cur CURSOR STATIC LOCAL FORWARD_ONLY FOR
SELECT  physical_name
FROM    sys.master_files
WHERE   database_id = db_id(@database)
ORDER   BY [file_id];

OPEN dbfiles_cur
FETCH NEXT FROM dbfiles_cur INTO @file
WHILE @@FETCH_STATUS = 0
BEGIN
SET @attach_cmd = @attach_cmd + '    ,@filename' + CAST(@i AS NVARCHAR(10)) + ' = ''' + @file + '''' + CHAR(10);
SET @i = @i + 1;
FETCH NEXT FROM dbfiles_cur INTO @file
END

CLOSE dbfiles_cur;
DEALLOCATE dbfiles_cur;

-- Output "attach" command.
PRINT @attach_cmd;

-- DETACH (uncomment the following line at your peril - IT WILL DETACH ALL USER DATABASES!)
/*EXEC sp_detach_db @dbname = @database, @skipchecks = 'true';  -- change this to false if you want it to update stats before detaching*/

FETCH NEXT FROM dbname_cur INTO @database
END

CLOSE dbname_cur;
DEALLOCATE dbname_cur;

The usual disclaimers apply; I’ve used this in a limited number of scenarios so I wouldn’t consider it thoroughly tested, and if you run it in a production environment and “accidentally” detach databases, or otherwise break something – don’t blame me.

Cheers
Dave

2011-05-23

Identifying unique indexes that aren’t defined as unique

Confused? So am I, but after changing the title of this post about a dozen times this is what we’re left with… 🙂

I was in the process of examining some indexes on our test Denali server (see previous posts) when I decided to check how many of our non-unique, non-clustered indexes are actually made up of a unique column (or a unique combination of columns in the case of a compound index). In other words, which non-clustered indexes could I convert to UNIQUE non-clustered indexes? I whipped up a script to do this and thought I’d share.

If you’re wondering why I might want to do this; I’m interested to see what impact it has on index storage, as well as read performance. I’m by no means saying you should go and recreate all of your NC indexes as unique – remember that this script will return all indexes where the *current* data in the table is distinct. There are no guarantees that future inserts & updates won’t cause the data to become non-unique (and break your application).

Here’s the script. It’s pretty straight forward, and simply returns a list of tables/indexes at the moment, but could easily be change to actually output the CREATE / DROP_EXISTING commands instead.

SET NOCOUNT ON;
DECLARE @i INT = 1,
		@rc INT,
		@cmd NVARCHAR(2000),
		@tablename NVARCHAR(128),
		@indexname NVARCHAR(128),
		@collist NVARCHAR(1000),
		@rowcount INT,
		@rowcount_d INT;

DECLARE @tt_ncidxs TABLE (rowid INT IDENTITY, [object_id] INT, [object_name] VARCHAR(128), [index_id] INT, index_name VARCHAR(128), [index_columns] VARCHAR(1000), is_unique BIT);

WITH IndexColumns AS
(
	SELECT	DISTINCT
			i.[object_id],
			QUOTENAME(OBJECT_NAME(i.[object_id])) AS [object_name],
			i.[index_id],
			QUOTENAME(i.[name]) AS [index_name],
			STUFF(
					(
					SELECT	', ' + QUOTENAME(COL_NAME([object_id], column_id)) AS 'data()'
					FROM	sys.index_columns AS t2
					WHERE	c.[object_id] = t2.[object_id]
					AND		c.index_id = t2.index_id
					FOR		XML PATH ('')
					), 1, 2, ''
				) AS 'index_columns'
	FROM	sys.indexes AS i
	JOIN	sys.index_columns AS c ON c.[index_id] = i.[index_id] AND c.[object_id] = i.[object_id]
	JOIN	sys.objects AS so ON i.[object_id] = so.[object_id]
	WHERE	i.is_unique = 0
	AND		i.is_unique_constraint = 0
	AND		so.is_ms_shipped = 0
	AND		so.[type] = 'U'
)
INSERT	@tt_ncidxs ([object_id], [object_name], [index_id], [index_name], [index_columns])
SELECT	[object_id],
		[object_name],
		[index_id],
		[index_name],
		[index_columns]
FROM	IndexColumns

--SELECT * FROM @tt_ncidxs
SELECT @rc = @@ROWCOUNT;

WHILE @i <= @rc
BEGIN
	SELECT @cmd = NULL, @rowcount = NULL, @rowcount_d = NULL, @tablename = NULL, @collist = NULL;

	SELECT	@tablename = [object_name],
			@indexname = index_name,
			@collist = [index_columns]
	FROM	@tt_ncidxs
	WHERE	rowid = @i;

	RAISERROR('Table %s column list: %s', 10, 1, @tablename, @collist) WITH NOWAIT;

	SET @cmd = 'SELECT @x_rowcount = COUNT(*) FROM ' + @tablename + ' WITH (NOLOCK);'
	EXEC sp_executesql @cmd, N'@x_rowcount INT OUTPUT', @x_rowcount = @rowcount OUTPUT;

	SET @cmd = 'SELECT @x_rowcount_d = COUNT(*) FROM (SELECT DISTINCT ' + @collist + ' FROM ' + @tablename + ' WITH (NOLOCK) ) x;'
	EXEC sp_executesql @cmd, N'@x_rowcount_d INT OUTPUT', @x_rowcount_d = @rowcount_d OUTPUT;

	RAISERROR('Index %s on table %s selectivity = %i/%i', 10, 1, @indexname, @tablename, @rowcount_d, @rowcount) WITH NOWAIT;

	IF (ISNULL(@rowcount, 1) = ISNULL(@rowcount_d, 2)) -- i.e. column values are (currently) distinct/unique
	UPDATE @tt_ncidxs
	SET is_unique = 1
	WHERE rowid = @i;

	SET @i += 1;
END

-- Get a list of non-unique nonclustered indexes whose index columns are currently distinct.
-- These *may* be candidates for conversion to UNIQUE nonclustered indexes instead.
-- NB: PLEASE MAKE 100% SURE THAT THE COLUMNS ARE ALWAYS GOING TO BE UNIQUE BEFORE CREATING A UNIQUE INDEX
SELECT	*
FROM	@tt_ncidxs
WHERE	is_unique = 1
ORDER	BY [object_name], [index_id]

Cheers
DB Dave

2011-05-16

SQL Breakfast

This morning we (i.e. my team and I) tried something we’ve spoken about for some time, but haven’t got around to doing until now; a “Tech Breakfast”.

So I stopped at a bakery on the way to work to pick up some yummy baked goods, and all 3 of us piled into a meeting room at 07:30 with our individual beverage of choice in hand, and watched Kimberly Tripp’s “Indexing Internals” MCM presentation and demo.

It went down pretty well, so I think we’ll make it a regular thing (maybe every 2nd Friday morning). If you work in a team, and feel like mixing things up a little, give something like this a try. At the very least it exposes people to things they might not have otherwise learnt themselves, and also sparks discussion in the team about the material being viewed, and how it may be relevant in their own environments or projects.

All in all not a bad way to kick off a Friday.

Cheers,
DB Dave

2011-05-13
Our SQL Server “Denali” experiences – Part 2

Where has the week gone? I finally got round to continuing my SQL Denali testing today – with very promising results in fact!

To give you an idea of the environment (which I brushed over in the previous post) and a little more context, we have a number of SQL servers which sit between our web servers, and our main core databases. These act as “read-only” search, browse and caching servers for the Trade Me website. They each hold the same data, have the same configuration, and run on the same hardware, which makes it relatively easy to add new servers (i.e. scale horizontally) when necessary.

Add to the mix some clever networking kit, and a few in-house developed bits & pieces, and we can finely control the amount (and type) of load on each individual server. Very handy for testing new hardware and software, service packs, etc.

So, just after lunch time this afternoon, I very gingerly put 1% search load on the Denali test box. To cut a long story short, over the course of the next few hours I increased load until we were running 50% of all search load on the 1 Denali test server. Trust me when I say that this is pretty damn awesome. 🙂

The Denali test server is exactly the same hardware-wise as the existing SQL 2005 servers. We’ve tried on a few occasions to upgrade to SQL 2008, but a pretty well documented full-text related performance bottleneck meant we could never get more than 5-10% search load on a single box. The issue was with the full-text merge process causing excessive locking on the underlying indexed table – which in our environment is constantly changing, as well as being constantly queried – this meant we’ve just stuck with 2005 thus far.

Although everything went very well, there were a few weird things I ran into while getting the server ready; firstly I needed to shrink a tempdb data file that had blown out, but the shrinkfile operation never completed (and actually timed out once or twice after around 5 minutes which I’ve never come across before). Because I couldn’t shrink the data file I thought I’d just restart the SQL service, which would recreate the tempdb at its original (i.e. smaller) size, but this wasn’t the case. The initial size was correctly set, but after a restart of the SQL service the tempdb file size was unchanged. I’ll need to investigate this further next week, so will post any new findings.

That’s all for this week’s testing.
Cheers
DB Dave

2011-05-12
Snapshot isolation and tempdb growth

Over the weekend we were alerted to a server running low on disk space. Not particularly exciting in itself, but the fact that it was happening on a relatively newly build SQL 2008 database server with a small database (~2.5GB) was interesting.

This particular database belongs to one of our newer and smaller websites, and has READ_COMMITTED_SNAPSHOT enabled (we did this to prevent excessive blocking, which we ran into and weren’t able to control many other ways since this site is built using an ORM which generates the SQL commands on the fly).

Anyway, getting back to the point; disk space was low because the tempdb data file was using around 15GB of space. I assumed that someone was explicitly doing something to cause the blowout (index rebuilds, gnarly query, etc.) but I couldn’t find any culprits straight away.

I then checked the sys.dm_tran_session_transactions DMV (after reading through one of Raj’s posts) which showed one particularly long running session (over 14 days in fact!). There were no open transactions that I could see, and the session itself had been suspended for practically all of those 14 days, so the reason I hadn’t seen it earlier was that most of my queries were inner-joining to sys.dm_tran_session_transactions which didn’t have a corresponding row. The query had a shared schema lock on 2 tables though, which is why the version store wasn’t flushing I guess.

I had thought that the version store was only maintained while there was an open transaction though – so either there was an open transaction and I simply didn’t see it (I ran “DBCC OPENTRAN” with no results) or it was enough that the session held schema locks. Anyone have any ideas on this?

The last command executed by the SPID was a relatively common & trivial one from the website, so it looks like it was an orphaned connection from the application – although how that happened we’re not sure. I killed the SPID and was then able to shrink the tempdb down to its usual level.

I haven’t had much exposure to snapshot isolation, but at least now know to check for this if we have problems with tempdb growth in future. 🙂

Cheers
DB Dave

2011-05-09

Tag: sql

Full-Text Search Operators – Part 1: Cleaning up parentheses in a string

Generate scripts to detach/attach all databases

Identifying unique indexes that aren’t defined as unique

SQL Breakfast

Our SQL Server “Denali” experiences – Part 2

Snapshot isolation and tempdb growth