Hacker News new | ask | show | jobs
by mlthoughts2018 2928 days ago
Yes, I work on a team that uses NLTK for lots of word canonicalization tasks in an NLP-heavy search engine. There are other options that work well too, but we have found NLTK to be very good, even at a large scale.

Our pipeline uses NLTK to take in a string of text, do word tokenization, lemmatization and stemming, and construct bigrams and trigrams, as part of a large map-reduce job for building text search indices.