Hacker News new | ask | show | jobs
by gerenuk 3119 days ago
Here is a brief overview of what you need to do:

1. Use nltk to get all the nouns etc. from the topic.

2. You can use LDA/TF-IDF (gensim) for your questions to extract the most common topic.

3. Use cosine similarity once you have the corpus built to tag a question to the right category.

Look into gensim for more details for the topic modeling.