| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by arrmn 3196 days ago

I'm currently working on something similar, these were my first two ideas:

Try to train your own word2vec model on a twitter dataset and then you could use the weighted tf-idf average of these vectors. You get a vector for each tweet, and tweets that are about the same topic should be next to each other. Then try clustering algorithms, you can use the cosine distance to find the nearest X tweets.

Second Idea would be to train doc2vec with twitter data.

Another worthwhile idea could be to use LDA, haven't tried it myself