|
|
|
|
|
by mtrimpe
3505 days ago
|
|
I really loved reading this article but it's always so hard to figure out exactly how these things work out in detail. I understand matrix multiplication but it seems that (some of) these matrix to vector calculations are actually trained by/as part of the neural net... but how exactly that works I can't figure out coming at it from articles like this. |
|
Thanks! I'm planning to make two follow up posts, on each of the systems, that go through those details. I blurred them out in this post because I wanted to get across this more abstract story about the data types and transformations.
There are lots of good posts about attention mechanisms. The WildML post is good, as is Chris Olah's post. Bidirectional RNNs are a little bit less well covered, but the idea is not too difficult to understand given a single RNN (or LSTM, GRU etc).
You should also read the papers :). This is how most people who are doing ML --- including the people building practical things, not researchers --- are staying up to date. Academia is so competitive and writing is cheap relative to experimentation. The deep learning literature is really pretty easy to follow.