|
|
|
|
|
by SeppoErviala
4569 days ago
|
|
Check out gensim if you want to do topic modeling or similarity comparisons in Python. http://radimrehurek.com/gensim/ It has good implementations of various algorithms, some of which support streaming or dirstribution, and it allows loading and dumping data in various formats. I've used it for building content based recommender using tf-idf, lsi and similarity index. After the index is built, queries to it are really fast. It can handle quite large corpuses with little memory. |
|
The reason for that is a pretty epic list of dependencies (have fun explaining why the prod boxes need a fortran compiler), but in terms of efficiency and speed of development it's an obvious choice.