Hacker News new | ask | show | jobs
by jaredchung 3217 days ago
We just implemented topic suggestions as well on our Q&A data. The literature in this field is actually quite solid. Here are a few key insights from our research: 1) Using recall@5 as your test metric gives you the benefit of being able to compare your results against the academic literature. They often (but not always) use recall@5. 2) We read the articles that introduce the following systems: TagCombine (was state of the art in 2013), NetTagCombine, EnTagRec, TagMulRec, fastText's supervised topic recommendation, and a few others. Unfortunately other than fastText not many of these have OSS libraries available out of the box, but that's ok as long as you're willing to use the underlying methods which are available off the shelf. 3) Our approach: we tested a few of the above systems, as well as mix-and-matched our own systems with tf-idf, topic modeling, multilabel classification, l-lda, fasttext, and others. For each attempt we calculated (a) recall@5 against our test set (we used k-means cross validation) and (b) how long it took to complete the training step. 4) In the end, we decided that we got the best combination of recall@5, training time, and engineer efficiency by using ONLY the fastText library. In the end we spent about 1 week trying other methods before we tried fastText. fastText took us about 3 hours to get first results, and then we tweaked the parameters for another two days before we found the right combination of learning rate, epoch, n-gram #, etc. (read the docs :) Our current recall@5 for v1 of this feature is a little above 0.5, which we think is good enough to provide a solid user experience.

Caveat: Unlike Github we actually do have user-generated labels to start with, so our modeling problem has one-step less complexity than theirs but are otherwise the same.

(TL;DR If you have gold standard data to start with and no time to read academic articles, try fastText supervised learning. If you have time to read academic articles, start with a lit search.)

1 comments

Great reply, thank you!

Can I ask what your site is?

CareerVillage.org