Full-Text Search Operators – Part 4: Wrapping it up

"iPod nano 4Gb wrapping" by Darkmere
That’s a wrap!

Part 1: Cleaning up parentheses in a string
Part 2: Replacing alternate operators
Part 3: Splitting the search string

Gee, it’s been over a month since my last post already! Since then we’ve been crazy-busy at work, but the good news is that we have a few promising DBA candidates to interview over the next week or two.  I’ve also spoken at the Christchurch and Auckland SQL Server user groups, which were both awesome!

Anyway, to the point of this post. This will wrap up the series on my solution to the challenge of creating a T/SQL-based Google-like search term handler for SQL full-text. In this exciting episode I’ll run through how I handle quotes (single & double quote pairs), the “near” operator, and then the building of the final full-text command.

NEAR enough…

“NEAR” is a full-text proximity operator. This sounds really straightforward, but the catch here is that it actually behaves exactly like the default “AND” Boolean operator unless you’re filtering based on the full-text rank value returned by CONTAINS or CONTAINSTABLE.

In other words, if you search for “dog NEAR cat”, then all documents returned must have both “dog” and “cat” in them.  Where NEAR is different to AND, is that it awards a higher score to documents where “dog” and “cat” are 50 characters or less apart (with a higher score the closer the terms are to each other, or 0 if they are more than 50 characters apart).
So in order for NEAR to actually limit your search results, you’ll need to add a WHERE clause to your query, limiting results to those with ranks greater than 0. Read more here.

-------------------------------------------------------------------------------------------------
-- "NEAR" OPERATOR HANDLER
-------------------------------------------------------------------------------------------------
IF EXISTS(SELECT * FROM @tt_searchwords WHERE REPLACE(operator, '`', '') = 'op_near')
BEGIN
-- reinitialise
SELECT @i = NULL, @current_word = '', @o = NULL, @oo = NULL;
-- a cursor to loop through each NEAR operator row (yes, I'm getting lazy here...)
DECLARE near_string CURSOR LOCAL STATIC READ_ONLY FORWARD_ONLY FOR
SELECT rowid
FROM @tt_searchwords
WHERE REPLACE(operator, '`', '') = 'op_near';

OPEN near_string;
FETCH NEXT FROM near_string INTO @i;
WHILE @@FETCH_STATUS = 0
BEGIN
-- get the rowid of the value before and after the NEAR operator row
SELECT @o = MAX(rowid)
FROM @tt_searchwords
WHERE rowid < @i
AND operator NOT LIKE 'paren%';

SELECT @oo = MIN(rowid)
FROM @tt_searchwords
WHERE rowid > @i
AND operator NOT LIKE 'paren%';

-- if the NEAR-operator (~) is the first or last word in the search string, its not a valid option
IF (@o IS NULL) OR (@oo IS NULL)
DELETE @tt_searchwords
WHERE rowid = @i;
ELSE BEGIN
-- since NEAR requires 2 words (one on each side), we get the word before and after the ~ operator
SELECT @current_word = '(' + CASE WHEN PATINDEX('%[()!,]%', word) > 0 THEN '"' + word + '"' ELSE word END
FROM @tt_searchwords
WHERE rowid = @o;

SELECT @current_word = @current_word + ' NEAR ' + CASE WHEN PATINDEX('%[()!,]%', word) > 0 THEN '"' + word + '"' ELSE word END + ')'
FROM @tt_searchwords
WHERE rowid = @oo;

-- update the NEAR operator record with the concatenated search term
UPDATE @tt_searchwords
SET word = @current_word,
operator = 'searchnear'
WHERE rowid = @i;

-- delete the records before and after the NEAR operator
UPDATE @tt_searchwords
SET operator = 'searchword_near'
WHERE rowid IN (@o, @oo);
END

FETCH NEXT FROM near_string INTO @i;
END

CLOSE near_string;
DEALLOCATE near_string;

-- delete search words that are now included IN "NEAR phrases"
DELETE @tt_searchwords
WHERE operator = 'searchword_near';
END

It’s pretty well commented, and isn’t very complicated so I won’t waste any more space here explaining it. If you do have any questions please feel free to leave me a comment though!

Builderer of awesome

This next bit basically takes the content of the table variable (which now contains our correctly formatted and tidied up command fragments), and smushes them together.  It’s not pretty, but it works.

-------------------------------------------------------------------------------------------------
-- Final command string builderer of awesome
-------------------------------------------------------------------------------------------------
-- Need to quote any search terms that contain certain non-alphanumeric characters
UPDATE @tt_searchwords
SET word = CASE
WHEN PATINDEX('%[[[][]]()!,]%', word) > 0 THEN '"' + word + '"'
WHEN CHARINDEX('[', word) > 0 THEN '"' + word + '"'
WHEN CHARINDEX(']', word) > 0 THEN '"' + word + '"'
ELSE word END
WHERE operator NOT IN ('searchphrase', 'searchnear', 'searchprefix'); -- these operator types are already quoted
-- reinitialise
SELECT @i = NULL, @ii = NULL, @final_search = '', @final_search_temp = '', @current_word = '';

-- get the range of row IDs that represent valid search terms and operators
SELECT @i = MIN(rowid),
@ii = MAX(rowid)
FROM @tt_searchwords
WHERE operator LIKE 'search%'
OR operator LIKE 'paren%';

WHILE @i <= @ii
BEGIN
SELECT @final_search_temp =
CASE REPLACE(operator, '`', '')
WHEN 'op_not' THEN ' AND NOT '
WHEN 'op_or' THEN ' OR '
WHEN 'paren_o' THEN ' AND ('
WHEN 'paren_x' THEN ')'
ELSE ' AND '
END
+ CASE REPLACE(operator, '`', '')
WHEN 'op_not' THEN '' -- NOT
WHEN 'op_or' THEN '' -- OR
WHEN 'paren_o' THEN '' -- (
WHEN 'paren_x' THEN '' -- )
WHEN 'searchnear' THEN word -- NEAR
WHEN 'searchphrase' THEN word -- "search phrase"
WHEN 'searchexplicit' THEN word -- explicit search
WHEN 'searchprefix' THEN word -- wildcard
ELSE REPLACE(@word_syntax, '<PLACEHOLDER/>', word)-- AND (default)
END
FROM @tt_searchwords
WHERE rowid = @i;

-- CONTAINSTABLE only accepts up to 4000 characters
IF LEN(CAST(@final_search + @final_search_temp AS VARCHAR(MAX))) > 4000
BREAK
ELSE
SET @final_search = @final_search + @final_search_temp;

IF @i = @ii BREAK;

SELECT @i = MIN(rowid)
FROM @tt_searchwords
WHERE rowid > @i;
END

-- sort out issues of operator double-ups, etc
-- clunky but it works
SET @final_search = REPLACE(@final_search, ' AND NOT AND ', ' AND NOT ');
SET @final_search = REPLACE(@final_search, ' OR AND ', ' OR ');
SET @final_search = '(' + @final_search + ')';
SET @final_search = REPLACE(@final_search, '( AND NOT ', '(');
SET @final_search = REPLACE(@final_search, ' AND NOT )', ')');
SET @final_search = REPLACE(@final_search, '( AND ', '(');
SET @final_search = REPLACE(@final_search, ' AND )', ')');
SET @final_search = REPLACE(@final_search, '( OR ', '(');
SET @final_search = REPLACE(@final_search, ' OR )', ')');
SET @final_search = SUBSTRING(@final_search, 2, LEN(@final_search)-2);

That’s it – I’ve left out some of the fluff in-between the various blocks of code, but have attached the full stored proc here for you to download and tinker with. All I ask is that if you make it better, please let me know so I can improve on this. It could definitely use some love to make it prettier and more efficient, although if pure performance is your goal you should probably investigate a CLR solution instead.

Cheers
DB Dave

4 thoughts on “Full-Text Search Operators – Part 4: Wrapping it up

  1. Peter Davies

    Just a note to say excellent work DBDave, and I never really do this but you have saved me a ton of work with this.
    Given me lots of new ideas to play woth as well.

    Again Superb

    Reply
  2. Pingback: Full-Text Search Operators – Part 1: Cleaning up parentheses in a string - database dave

  3. Pingback: SQL Server word-level bigram function - database dave

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.