Hacker News new | ask | show | jobs
by kamaln7 2708 days ago
We used Lucene (open source) in our information retrieval course and tokenizing (w/ removing stop words etc.) is one of the things it does. If you just want to experiment, that's also another option to look at if you like!