Hacker News new | ask | show | jobs
by adamklec 3745 days ago
Hello. This is Adam. I trained the model and made the visualization. Thanks for your comments.

This model is not a POS tagger. The model was trained to predict the next word in the email given the preceding words. So in that sense, it's similar to the word2vec models discussed in the link you shared. However for this work I used a recurrent neural network to learn a language model of the emails in our database.

After training, I extracted the learned word vectors from the model (they are the weights that connect the input layer which uses a one-hot-encoding of vocab words to the embedding layer). I then used the t-SNE algorithm to reduce the dimensionality of the learned word vectors and then plotted them in 2 dimensions. The colors representing the parts of speech were added after the fact to show that the model had learned to distinguish between nouns, verbs, etc.

3 comments

Thanks Adam! It's nice work and it seems like there's a pretty epic dataset to analyze at x.ai. My main confusion was what the visualized vectors represented, but I guess you've answered that by saying they're the first layer in your model (if I'm interpreting correctly). What I don't quite understand is how you got the word vector from the inputs. It sounds like you represent each word as a one hot encoding (similar to indexes), and then you pass this one hot encoding through the first layer giving you the word vector for each input?
That's right. The weights that connect the Nth neuron in the one-hot input layer to the embedding layer can be thought of as a vector encoding of the Nth word in the vocabulary.
How does the recurrent neural network technique compare to the CBOW technique in word2vec? CBOW would've been the first thing I tried.
I agree that's an interesting comparison to make but I'm not sure of the answer. The original purpose of this work was not to generate word vectors but rather to evaluate whether we have enough data to start using deep learning algorithms. That an RNN trained on our data was able to learn word vectors with a significant amount of structure seems like a positive sign. But I don't know how the quality of these word vectors would compare to vectors generated by more standard word2vec algorithms.
There are tons of ways to evaluate word vector quality! Word analogy tasks, word similarity tasks, contextual prediction tasks, etc.

This link contains a bunch of relevant evaluation datasets and benchmarks obtained using word2vec, GloVe, etc. You can evaluate your RNN-learned vectors and compare them to a traditionally trained word2vec-trained vectors. Link here: http://www.bigdatalab.ac.cn/benchmark/bm/Domain?domain=Word%...

For more background on evaluating word vectors check out these pretty great lecture notes from Socher's NLP class: http://cs224d.stanford.edu/lecture_notes/LectureNotes2.pdf

Also, here's the original papers from a few years ago that introduced many of these datasets and evaluation standards:

https://papers.nips.cc/paper/5021-distributed-representation...

http://www.cs.cmu.edu/~mfaruqui/papers/acl14-vecdemo.pdf

You could read this nicely written review to get more info about RNN as a starting point http://arxiv.org/abs/1506.00019.
Implementation wise, did you train it with one of the widespread python libs or opted in to one the scala(nlp) frameworks? If the latter, I'd be interested which for LSTM worked for you (factorie more probabalistic, mllib afaik no good for compute graphs, d4j / sparkling water).
This work was done in Python using Theano and Keras.