| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by meeper16 3917 days ago
	Vector Space replacing or being combined with TF-IDF approaches is new way of summarizing and searching for meaning in documents... http://52.11.1.7/TuataraSum/example_context_control-ml2.html

1 comments

gibrown 3917 days ago

Interesting. This basically uses the background word2vec data for the entire Web to provide more information and help with things like disambiguation, synonyms, etc? Am I understanding that correctly?

Maybe nit-picky thought, but its not clear to me that the TF-IDF part is what's doing a lot of extra lifting there.

Do you know of any good evaluations between using vector space data and other methods for summarization?

link

meeper16 3917 days ago

Word2Vec was a fork or based on a more exhuastive vector space approach here https://www.kaggle.com/c/word2vec-nlp-tutorial/forums/t/1234...

I've compared the summarization to others like OTS http://libots.sourceforge.net/ which I believe strictly relies on TF-IDF and it seems better and allows for context to control the summarization.

Other similar approaches might be based on Latent Semantic Analysis, Latent Semantic Indexing or LDA.

link

gibrown 3917 days ago

Thanks for the links!

link