Hacker News new | ask | show | jobs
by zmillman 4177 days ago
Hmm, what about word stemming? I looked up "windmills" and got an empty result set.
3 comments

Same with "walked". I've used the Porter Stemming Algorithm in the past, and it works well.

http://tartarus.org/martin/PorterStemmer/

The data is stored in postgres, so it should be simple enough to use the Snowball dictionary/stemmer and the tsvector/tsquery functions to sort this out.
What you really want is a lemmatizer (stemming approximates lemmatization). I believe that NLTK has a WordNet lemmatizer, but I don't know much about it.
I'll have to see if I can find a good library for this. The ones I tried (like Node Natural) just didn't give great results.