Create a LIST of unique words for searches. The words will have no punctuation, not be longer than 2 characters, with most common prepositions removed.
The returned string is very suitable for a keywords field in a table. Set an index on the column for SQL Server to be able to use it very efficiently for text searches.
public static IEnumerable<string> KeywordList(this string text)
{
// split into lower case words with no trailing/leading punctuation and remove duplicate words
var uniqueWords = Regex.Matches(text.ToLower(), "\\w+('(s|d|t|ve|m))?")
.Cast<Match>().Select(x => x.Value).Distinct().ToList();
// remove two letter words and Contractions (like I'm and don't) but leave numbers
int a;
var result = from s in uniqueWords
where s.Length > 2 && !s.Contains("'") || int.TryParse(s, out a)
select s;
// fill 'illegals' list with words to drop i.e. not worth searching on
string dropwords = "which,than,that,have,seem,the,with,and,all,only,not,out,into,buy,probably,for,over,from,too,not,like,who,what,where,when,how,why,here";
List<string> filter = dropwords.Split(',').ToList();
var filtered = result.Except(filter);
return filtered;
}
Extension methods are easy to chain, simply call like this...
SearchTerm.KeywordList();