Full-Text Search Operators – Part 1: Cleaning up parentheses in a string

BooleanBeards

Part 1: Cleaning up parentheses in a string
Part 2: Replacing alternate operators
Part 3: Splitting the search string
Part 4: Wrapping it up

A few years ago I wanted to have a crack at improving one of our websites’ search by providing “search operator” functionality.  These are basically special words or characters (i.e. operators) which provide Boolean search capability; e.g. AND, OR, NOT.

I looked around but couldn’t find anything implemented using T/SQL.  There were a few .Net solutions, but I’m not a developer, and can only handle very basic .Net on a good day.  So I had a crack at writing my own. I started off with the Google cheat-sheet as my initial list of requirements, which was basically as follows:

  1. Must support the operators AND, OR, NOT, and NEAR
  2. Must allow for “exact-phrase” searches
  3. Should respect parentheses (for precedence control)
  4. Should be able to perform wildcard searches
  5. Operators must be case-sensitive (i.e. OR != or)

To attack this problem, I decided to split it into multiple chunks, and write separate blocks of code to meet each requirements.  In this post I’m going to introduce the first piece of the puzzle (which is number 3 from the list above; handling parentheses).  What this means is that given a string containing parentheses, I need to output the same string with valid parentheses intact (and obviously invalid ones removed).

Below is the function. I won’t explain it line by line since its pretty well commented, and relatively simple anyway.

IF OBJECT_ID('[dbo].[fn_search_clause_parentheses_handler]') IS NULL
EXEC('CREATE FUNCTION [dbo].[fn_search_clause_parentheses_handler] () RETURNS INT AS BEGIN RETURN(0); END ');
GO
/***************************************************************************************************************************
This is a supporting function, which is called by the function [fn_search_clause_get].
It accepts a string input, and outputs the same string, but with any invalid parentheses removed, and all remaining
valid parentheses buffered (by a space on each side) so that they are split correctly by the calling function.
Author : David Curlewis
Date : 07/2009

This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License
http://creativecommons.org/licenses/by-nc/3.0/
****************************************************************************************************************************/
ALTER FUNCTION [dbo].[fn_search_clause_parentheses_handler] (
@search_text NVARCHAR(4000)
) RETURNS NVARCHAR(4000)
AS
BEGIN
-------------------------------------------------------------------------------------------------
-- PARENTHESES HANDLER
-------------------------------------------------------------------------------------------------
-- IF parentheses exist in the string
IF (PATINDEX('%[()]%', @search_text) > 0)
BEGIN
DECLARE @pos_o INT, @pos_x INT, @charpos INT;
DECLARE @tt_char_split TABLE (id SMALLINT IDENTITY(1,1), [char] VARCHAR(2) NOT NULL);
SET @charpos = 1;

-- split the string apart into a temp table
WHILE @charpos <= LEN(@search_text)
BEGIN
INSERT @tt_char_split ([char])
SELECT SUBSTRING(@search_text, @charpos, 1);

SET @charpos = @charpos + 1;
END

-- while we have opening and closing parentheses
WHILE EXISTS(SELECT TOP 1 1 FROM @tt_char_split WHERE [char] = '(')
AND EXISTS(SELECT TOP 1 1 FROM @tt_char_split WHERE [char] = ')')
BEGIN
-- Get the position of the first closing parenthesis
SET @pos_x = ( SELECT MIN(id)
FROM @tt_char_split
WHERE [char] = ')');

-- Get the position of the first opening parenthesis
SET @pos_o = ( SELECT MAX(id)
FROM @tt_char_split
WHERE [char] = '('
AND id < @pos_x);

-- there is a valid pair of parentheses
IF (@pos_o IS NOT NULL AND @pos_x IS NOT NULL)
BEGIN
-- Escape this pair so we know they've been processed
UPDATE @tt_char_split
SET [char] = '(('
WHERE id = @pos_o;

UPDATE @tt_char_split
SET [char] = '))'
WHERE id = @pos_x;
END
ELSE BEGIN -- there is not a valid pair of parentheses
UPDATE @tt_char_split
SET [char] = ''
WHERE id IN (@pos_o, @pos_x);
END
END

-- remove any remaining (invalid) parentheses
UPDATE @tt_char_split
SET [char] = ''
WHERE [char] IN ('(', ')');

SET @search_text = '';

-- build new search string
SELECT @search_text = @search_text + REPLACE(REPLACE([char], '((', ' ( '), '))', ' ) ')
FROM @tt_char_split
ORDER BY id;

-- remove empty pairs, i.e. " ( ) "
WHILE CHARINDEX(' ( ) ', @search_text) > 0
SET @search_text = REPLACE(@search_text, ' ( ) ', '');
END

RETURN(@search_text);
END
GO

So, its a scalar function which accepts a string, and returns that same string, but with it’s parentheses tidied up and “buffered” (i.e. I wrap them in spaces to make it easier for the calling function to then split the string into its component parts).

Now, as I sit here watching the latest straight-to-dvd masterpiece that is “Barbie: A fashion fairy-tale(hey, its a Sunday afternoon and I have 3 daughters – you’d lose the battle too!), I’m having a hard time getting the code to post properly, so if you have any issues, flick me a note in the comments below and I’ll sort something else out.

Give it a try and let me know if you find any bugs (or make any improvements!). I’ll follow up soon with posts describing the rest of the process, but for now I’ve had enough of Barbie, and think I deserve a whisky. Winking smile

Cheers

DB Dave

4 thoughts on “Full-Text Search Operators – Part 1: Cleaning up parentheses in a string

  1. Pingback: Removing accented and illegal characters from a string | David Curlewis

  2. Pingback: Full-Text Search Operators – Part 2: Splitting the search string | David Curlewis

  3. Pingback: Full-Text Search Operators – Part 3: Splitting the search string | David Curlewis

  4. Pingback: Full-Text Search Operators – Part 4: Wrapping it up | David Curlewis

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.