|
|
|
|
|
by karmacondon
4141 days ago
|
|
I used a similar technology stack for categorizing bookmarks (boilerpipe + gensim lda). Interesting that we wound up choosing the same tools. In the interest of reporting on failed experiments, I also tried a k-means analysis written in php. It was slow and worthless, I wouldn't recommend anyone else going down that road. In terms of next steps, I've been trying to use the open source HLDA software from David M. Blei's group [0] to do hierarchical clustering to avoid having to decide on the number of topic parameters. Haven't gotten it to compile on my machine yet though. [0] http://www.cs.princeton.edu/~blei/topicmodeling.html |
|