| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zmillman 4225 days ago
	Hmm, what about word stemming? I looked up "windmills" and got an empty result set.

3 comments

byoung2 4225 days ago

Same with "walked". I've used the Porter Stemming Algorithm in the past, and it works well.

http://tartarus.org/martin/PorterStemmer/

link

chrisfarms 4224 days ago

The data is stored in postgres, so it should be simple enough to use the Snowball dictionary/stemmer and the tsvector/tsquery functions to sort this out.

link

lotophage 4224 days ago

What you really want is a lemmatizer (stemming approximates lemmatization). I believe that NLTK has a WordNet lemmatizer, but I don't know much about it.

link

impostervt 4225 days ago

I'll have to see if I can find a good library for this. The ones I tried (like Node Natural) just didn't give great results.

link