Keyword algorithm extension method

Create a LIST of unique words for searches. The words will have no punctuation, not be longer than 2 characters, with most common prepositions removed.

The returned string is very suitable for a keywords field in a table. Set an index on the column for SQL Server to be able to use it very efficiently for text searches.

           
       public static IEnumerable<string> KeywordList(this string text)
        {
            // split into lower case words with no trailing/leading punctuation and remove duplicate words
            var uniqueWords = Regex.Matches(text.ToLower(), "\\w+('(s|d|t|ve|m))?")
                .Cast<Match>().Select(x => x.Value).Distinct().ToList();
            // remove two letter words and Contractions (like I'm and don't) but leave numbers
            int a;
            var result = from s in uniqueWords
                         where s.Length > 2 && !s.Contains("'") || int.TryParse(s, out a)
                         select s;
            // fill 'illegals' list with words to drop i.e. not worth searching on
            string dropwords = "which,than,that,have,seem,the,with,and,all,only,not,out,into,buy,probably,for,over,from,too,not,like,who,what,where,when,how,why,here";
            List<string> filter = dropwords.Split(',').ToList();
            var filtered = result.Except(filter);
            return filtered;
        }        
    

Extension methods are easy to chain, simply call like this...

         
           SearchTerm.KeywordList();