|
> A RNN makes predictions based on sequential data. When a RNN is trained on sequences of words, it learns to represent each word as a high dimensional vector which encodes the model’s understanding of that word. By projecting these high dimensional vectors into a two dimensional space, it’s possible to visualize their relationships and glean insights into the concepts that the model has learned. It sounds like what's being visualized are the probability vectors that the model creates, which are usually a value for each possible class (noun, verb, etc. in this case). If this is the case, I don't see how the t-sne visualization is much more useful than a confusion matrix. Typically prior to training, words are translated from dictionary indexes into word embeddings (high dimensional vectors, where dimension is >> than the number of classes) that let you compare them and do vector algebra like "king + queen - woman = man". You can visualize the word embeddings, and color code them by class after training to see if there are any sorts of patterns in your word embeddings. > The RNN learned all of this semantic understanding without a human ever having to code a definition of concepts like nouns, verbs, universities, cities, meetings, or social media. This is the power of deep learning algorithms. Was this an unsupervised approach? If so, that seems a little unusual for Part of Speech Tagging (POS tagging). I suppose the author could mean that the model was used to label Out of Vocabulary (OOV) words, aka words that never appeared in the training set. Labeling OOV data points is sort of the general benefit of machine learning, and I'm not sure can be attributed solely to Deep Learning. The main benefit I've gleaned from Deep Learning is that it automates the feature engineering phase of the machine learning pipeline. There are lots of good resources for RNNs, LSTMs, Word Embeddings and t-sne out there from Stanford, NYU, Theano, TensorFlow, and the like. Here's a blog post that gives some background if you're interested: http://colah.github.io/posts/2014-07-NLP-RNNs-Representation... |
This model is not a POS tagger. The model was trained to predict the next word in the email given the preceding words. So in that sense, it's similar to the word2vec models discussed in the link you shared. However for this work I used a recurrent neural network to learn a language model of the emails in our database.
After training, I extracted the learned word vectors from the model (they are the weights that connect the input layer which uses a one-hot-encoding of vocab words to the embedding layer). I then used the t-SNE algorithm to reduce the dimensionality of the learned word vectors and then plotted them in 2 dimensions. The colors representing the parts of speech were added after the fact to show that the model had learned to distinguish between nouns, verbs, etc.