Hacker News new | ask | show | jobs
by meeper16 3917 days ago
Vector Space replacing or being combined with TF-IDF approaches is new way of summarizing and searching for meaning in documents...

http://52.11.1.7/TuataraSum/example_context_control-ml2.html

1 comments

Interesting. This basically uses the background word2vec data for the entire Web to provide more information and help with things like disambiguation, synonyms, etc? Am I understanding that correctly?

Maybe nit-picky thought, but its not clear to me that the TF-IDF part is what's doing a lot of extra lifting there.

Do you know of any good evaluations between using vector space data and other methods for summarization?

Word2Vec was a fork or based on a more exhuastive vector space approach here https://www.kaggle.com/c/word2vec-nlp-tutorial/forums/t/1234...

I've compared the summarization to others like OTS http://libots.sourceforge.net/ which I believe strictly relies on TF-IDF and it seems better and allows for context to control the summarization.

Other similar approaches might be based on Latent Semantic Analysis, Latent Semantic Indexing or LDA.

Thanks for the links!