Hacker News new | ask | show | jobs
by madcowherd 4114 days ago
Wondering how this differs from the SemanticVectors package? Will have to look into word2vec further.
2 comments

Word2vec is usually the standard neural word embeddings implementation. There are other algorithms as well such as glove[1], document embeddings[2] and backpropagation based methods[3]. Facebook just came out with a paper recently that beat word2vec as well[4]. Neural word embeddings are a neat way of representing concepts. I see a great future for automated feature engineering with text (joining audio and images) in deep learning.

[1]: http://nlp.stanford.edu/pubs/glove.pdf

[2]: http://cs.stanford.edu/~quocle/paragraph_vector.pdf

[3]:http://www.australianscience.com.au/research/google/35671.pd...

[4]: http://arxiv.org/abs/1502.01710

It's my first time seeing the package, but looking over the docs it looks like it implements LSA. The major difference here is that word2vec dramatically outperforms LSA in a variety of tasks (http://datascience.stackexchange.com/questions/678/what-are-...). My experience has been that the vector representations in LSA can be underwhelming and poorly performant. I can't comment on the Random Projection and Reflective Random Indexing techniques SemanticVectors implements.

This link is about document distances but still compares other techniques nicely: http://datascience.stackexchange.com/questions/678/what-are-...

Sorry, I should have specifically mentioned how it differs from random indexing/projection. I was immediately reminded of a similar inference example using random indexing/projection.

https://code.google.com/p/semanticvectors/wiki/PredicationBa...