Removing accented and illegal characters from a string

This post isn’t related to the series I’m currently writing on Full-Text search operators, but it is in the same ball-park so I thought I’d throw it up here.

Today I was working on our search term auto-suggest system (i.e. the system that shows a list of suggestions in a drop-down list below the search box as you type your search term in), and needed to improve the way we strip out “illegal” characters from our dictionary terms.

Illegal characters in our case refers to pretty much anything that isn’t alphanumeric, or a space, dollar sign, ampersand, hyphen, or period.  We don’t even want any accented characters (i.e. letters with diacritics) – although in this case we want to replace accented characters with their un-accented counterparts (e.g. Å should be replaced by A).

The way I did this (and I’m sure there are many, many better ways) is to insert illegal characters into a table variable, along with a replacement character, and then update the input variable for each row in this table variable – replacing any matching illegal characters each time with it’s replacement value.

Let me know if you have any questions, or different/better ways of doing this (yes, I know I could do it using a couple of lines of .Net in a CLR function, but I don’t have CLR enabled on this server). Smile


DB Dave

One Comment

Leave a Reply