| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by madcowherd 4114 days ago
	Wondering how this differs from the SemanticVectors package? Will have to look into word2vec further.

2 comments

agibsonccc 4114 days ago

Word2vec is usually the standard neural word embeddings implementation. There are other algorithms as well such as glove[1], document embeddings[2] and backpropagation based methods[3]. Facebook just came out with a paper recently that beat word2vec as well[4]. Neural word embeddings are a neat way of representing concepts. I see a great future for automated feature engineering with text (joining audio and images) in deep learning.

[1]: http://nlp.stanford.edu/pubs/glove.pdf

[2]: http://cs.stanford.edu/~quocle/paragraph_vector.pdf

[3]:http://www.australianscience.com.au/research/google/35671.pd...

[4]: http://arxiv.org/abs/1502.01710

link

juxtaposicion 4114 days ago

It's my first time seeing the package, but looking over the docs it looks like it implements LSA. The major difference here is that word2vec dramatically outperforms LSA in a variety of tasks (http://datascience.stackexchange.com/questions/678/what-are-...). My experience has been that the vector representations in LSA can be underwhelming and poorly performant. I can't comment on the Random Projection and Reflective Random Indexing techniques SemanticVectors implements.

This link is about document distances but still compares other techniques nicely: http://datascience.stackexchange.com/questions/678/what-are-...

link

madcowherd 4114 days ago

Sorry, I should have specifically mentioned how it differs from random indexing/projection. I was immediately reminded of a similar inference example using random indexing/projection.

https://code.google.com/p/semanticvectors/wiki/PredicationBa...

link