| HN Mirror

Well those are sort of equivalent. But yeah, we use cross-entropy between the projected output and the target emoji distribution as our objective to minimize.

And yes, we do the t-SNE on that pre-projection space. That's why we can visualize the targets (emoji) in it. We can also t-SNE the word embeddings themselves — the input to the RNN — which is also kind of interesting. It automatically learns all kinds of structures there. Chris Olah has a good post on word embeddings if you're interested: http://colah.github.io/posts/2014-07-NLP-RNNs-Representation...