Hacker News new | ask | show | jobs
by federkasten 4513 days ago
I am planning to open source it in several months. (Our codes have not been well-commented and well-structured yet...

Our implementation and algorithm detail is followings.

Its categorizing process is written in Python.

Using nltk, it makes corpus with TFIDF model from HN topics and comments. And it generates classifiers from this corpus with SVM algorithm using scipy and numpy.

FYI, its web interface is written in Clojure and ClojureScript.

1 comments

presumably you've trained it with hand annotated content, or bootstrapped from a few choice hn searches (like ?q=jquery will give you a web tech category)
Yes. You are right.

I trained classifiers with hand annotations (about 1000 contents or so)