Hacker News new | ask | show | jobs
by jknz 3745 days ago
From my understanding, they ran word2vec [1] on their email dataset. Anyone can run word2vec on any dataset with a single desktop machine. What I don't get is why word2vec is not mentioned?

Edit: the mentioned algorithm is t-SNE -- which seems to be another algorithm for dimension reduction. I don't know how it compares to word2vec

[1] for instance, https://www.tensorflow.org/versions/r0.7/tutorials/word2vec/...

[2] https://lvdmaaten.github.io/tsne/

3 comments

Although the visualization is similar to what you might see from a word2vec demo, they haven't run word2vec here. There are many ways to generate word vectors, word2vec is one, but the method used here was a Recurrent Neural Network (RNN). More specifically, the type of RNN was a Long Short Term Memory Network (LSTM). Since word vectors can have very high dimensionality (in this case, the dimension was 50), this makes them difficult to visualize. The t-sne algorithm reduces dimensionality to the point where you can visualize the initial vectors and still compare different data points to some useful extent.
They didn't run word2vec. They built a LSTM-RNN (Long Short Term Memory Recursive Neural Network). They mention this in the caption of the image showing word clusters.

word2vec and LSTM-RNN both produce word embeddings, which are vector representations of words. They then applied t-SNE, which is a dimensionality reduction technique designed to produce nicely separated 2 dimensional clusters from any high dimensional data. It can do this for any "type" of vector, not just word embeddings.

So, word2vec and LSTM-RNN both make high dimensional vectors out of words. t-SNE takes high dimensional vectors and makes them 2 dimensional.

word2vec is an algorithm to produce meaningful "word embeddings", which is a vector representation in a usually high-dimensional space. t-SNE is a dimensionality-reduction algorithm. Both can be used together, as they serve different purposes.
One could argue that Word embeddings are also dimensionality-reduction techniques: Words live in an infinite dimensional space, and the embeddings is a finite-dimensional projection of this infinte dimensional space.
I think of word vectors in the opposite light. Words stored in a dictionary have 1-dimension (their index), making comparisons more or less random. Word vectors augment the information you have about a word by continually examining the context that the word appears in a corpus of text.
In fact, the typical way to visualize word2vec embeddings is t-SNE.