From my understanding, they ran word2vec [1] on their email dataset. Anyone can run word2vec on any dataset with a single desktop machine. What I don't get is why word2vec is not mentioned?
Edit: the mentioned algorithm is t-SNE -- which seems to be another algorithm for dimension reduction. I don't know how it compares to word2vec
Although the visualization is similar to what you might see from a word2vec demo, they haven't run word2vec here. There are many ways to generate word vectors, word2vec is one, but the method used here was a Recurrent Neural Network (RNN). More specifically, the type of RNN was a Long Short Term Memory Network (LSTM). Since word vectors can have very high dimensionality (in this case, the dimension was 50), this makes them difficult to visualize. The t-sne algorithm reduces dimensionality to the point where you can visualize the initial vectors and still compare different data points to some useful extent.
They didn't run word2vec. They built a LSTM-RNN (Long Short Term Memory Recursive Neural Network). They mention this in the caption of the image showing word clusters.
word2vec and LSTM-RNN both produce word embeddings, which are vector representations of words. They then applied t-SNE, which is a dimensionality reduction technique designed to produce nicely separated 2 dimensional clusters from any high dimensional data. It can do this for any "type" of vector, not just word embeddings.
So, word2vec and LSTM-RNN both make high dimensional vectors out of words. t-SNE takes high dimensional vectors and makes them 2 dimensional.
word2vec is an algorithm to produce meaningful "word embeddings", which is a vector representation in a usually high-dimensional space. t-SNE is a dimensionality-reduction algorithm. Both can be used together, as they serve different purposes.
One could argue that Word embeddings are also dimensionality-reduction techniques: Words live in an infinite dimensional space, and the embeddings is a finite-dimensional projection of this infinte dimensional space.
I think of word vectors in the opposite light. Words stored in a dictionary have 1-dimension (their index), making comparisons more or less random. Word vectors augment the information you have about a word by continually examining the context that the word appears in a corpus of text.