Part 1: Cleaning up parentheses in a string
Part 2: Replacing alternate operators
Part 3: Splitting the search string
Part 4: Wrapping it up
A few years ago I wanted to have a crack at improving one of our websites’ search by providing “search operator” functionality. These are basically special words or characters (i.e. operators) which provide Boolean search capability; e.g. AND, OR, NOT.
I looked around but couldn’t find anything implemented using T/SQL. There were a few .Net solutions, but I’m not a developer, and can only handle very basic .Net on a good day. So I had a crack at writing my own. I started off with the Google cheat-sheet as my initial list of requirements, which was basically as follows:
- Must support the operators AND, OR, NOT, and NEAR
- Must allow for “exact-phrase” searches
- Should respect parentheses (for precedence control)
- Should be able to perform wildcard searches
- Operators must be case-sensitive (i.e. OR != or)
To attack this problem, I decided to split it into multiple chunks, and write separate blocks of code to meet each requirements. In this post I’m going to introduce the first piece of the puzzle (which is number 3 from the list above; handling parentheses). What this means is that given a string containing parentheses, I need to output the same string with valid parentheses intact (and obviously invalid ones removed).
Below is the function. I won’t explain it line by line since its pretty well commented, and relatively simple anyway.
IF OBJECT_ID('[dbo].[fn_search_clause_parentheses_handler]') IS NULL EXEC('CREATE FUNCTION [dbo].[fn_search_clause_parentheses_handler] () RETURNS INT AS BEGIN RETURN(0); END '); GO /*************************************************************************************************************************** This is a supporting function, which is called by the function [fn_search_clause_get]. It accepts a string input, and outputs the same string, but with any invalid parentheses removed, and all remaining valid parentheses buffered (by a space on each side) so that they are split correctly by the calling function. Author : David Curlewis Date : 07/2009 This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License http://creativecommons.org/licenses/by-nc/3.0/ ****************************************************************************************************************************/ ALTER FUNCTION [dbo].[fn_search_clause_parentheses_handler] ( @search_text NVARCHAR(4000) ) RETURNS NVARCHAR(4000) AS BEGIN ------------------------------------------------------------------------------------------------- -- PARENTHESES HANDLER ------------------------------------------------------------------------------------------------- -- IF parentheses exist in the string IF (PATINDEX('%[()]%', @search_text) > 0) BEGIN DECLARE @pos_o INT, @pos_x INT, @charpos INT; DECLARE @tt_char_split TABLE (id SMALLINT IDENTITY(1,1), [char] VARCHAR(2) NOT NULL); SET @charpos = 1; -- split the string apart into a temp table WHILE @charpos <= LEN(@search_text) BEGIN INSERT @tt_char_split ([char]) SELECT SUBSTRING(@search_text, @charpos, 1); SET @charpos = @charpos + 1; END -- while we have opening and closing parentheses WHILE EXISTS(SELECT TOP 1 1 FROM @tt_char_split WHERE [char] = '(') AND EXISTS(SELECT TOP 1 1 FROM @tt_char_split WHERE [char] = ')') BEGIN -- Get the position of the first closing parenthesis SET @pos_x = ( SELECT MIN(id) FROM @tt_char_split WHERE [char] = ')'); -- Get the position of the first opening parenthesis SET @pos_o = ( SELECT MAX(id) FROM @tt_char_split WHERE [char] = '(' AND id < @pos_x); -- there is a valid pair of parentheses IF (@pos_o IS NOT NULL AND @pos_x IS NOT NULL) BEGIN -- Escape this pair so we know they've been processed UPDATE @tt_char_split SET [char] = '((' WHERE id = @pos_o; UPDATE @tt_char_split SET [char] = '))' WHERE id = @pos_x; END ELSE BEGIN -- there is not a valid pair of parentheses UPDATE @tt_char_split SET [char] = '' WHERE id IN (@pos_o, @pos_x); END END -- remove any remaining (invalid) parentheses UPDATE @tt_char_split SET [char] = '' WHERE [char] IN ('(', ')'); SET @search_text = ''; -- build new search string SELECT @search_text = @search_text + REPLACE(REPLACE([char], '((', ' ( '), '))', ' ) ') FROM @tt_char_split ORDER BY id; -- remove empty pairs, i.e. " ( ) " WHILE CHARINDEX(' ( ) ', @search_text) > 0 SET @search_text = REPLACE(@search_text, ' ( ) ', ''); END RETURN(@search_text); END GO
So, its a scalar function which accepts a string, and returns that same string, but with it’s parentheses tidied up and “buffered” (i.e. I wrap them in spaces to make it easier for the calling function to then split the string into its component parts).
Now, as I sit here watching the latest straight-to-dvd masterpiece that is “Barbie: A fashion fairy-tale” (hey, its a Sunday afternoon and I have 3 daughters – you’d lose the battle too!), I’m having a hard time getting the code to post properly, so if you have any issues, flick me a note in the comments below and I’ll sort something else out.
Give it a try and let me know if you find any bugs (or make any improvements!). I’ll follow up soon with posts describing the rest of the process, but for now I’ve had enough of Barbie, and think I deserve a whisky.
Cheers
DB Dave
Pingback: Removing accented and illegal characters from a string | David Curlewis
Pingback: Full-Text Search Operators – Part 2: Splitting the search string | David Curlewis
Pingback: Full-Text Search Operators – Part 3: Splitting the search string | David Curlewis
Pingback: Full-Text Search Operators – Part 4: Wrapping it up | David Curlewis