Hacker News new | ask | show | jobs
by gibrown 3920 days ago
Interesting. This basically uses the background word2vec data for the entire Web to provide more information and help with things like disambiguation, synonyms, etc? Am I understanding that correctly?

Maybe nit-picky thought, but its not clear to me that the TF-IDF part is what's doing a lot of extra lifting there.

Do you know of any good evaluations between using vector space data and other methods for summarization?

1 comments

Word2Vec was a fork or based on a more exhuastive vector space approach here https://www.kaggle.com/c/word2vec-nlp-tutorial/forums/t/1234...

I've compared the summarization to others like OTS http://libots.sourceforge.net/ which I believe strictly relies on TF-IDF and it seems better and allows for context to control the summarization.

Other similar approaches might be based on Latent Semantic Analysis, Latent Semantic Indexing or LDA.

Thanks for the links!