| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mtrimpe 3505 days ago
	I really loved reading this article but it's always so hard to figure out exactly how these things work out in detail. I understand matrix multiplication but it seems that (some of) these matrix to vector calculations are actually trained by/as part of the neural net... but how exactly that works I can't figure out coming at it from articles like this.

2 comments

syllogism 3505 days ago

(Author here)

Thanks! I'm planning to make two follow up posts, on each of the systems, that go through those details. I blurred them out in this post because I wanted to get across this more abstract story about the data types and transformations.

There are lots of good posts about attention mechanisms. The WildML post is good, as is Chris Olah's post. Bidirectional RNNs are a little bit less well covered, but the idea is not too difficult to understand given a single RNN (or LSTM, GRU etc).

You should also read the papers :). This is how most people who are doing ML --- including the people building practical things, not researchers --- are staying up to date. Academia is so competitive and writing is cheap relative to experimentation. The deep learning literature is really pretty easy to follow.

link

dharma1 3505 days ago

Really like what you're doing with SpaCy and explosionAI, good stuff :)

What do you think about dilated convolutional encoder/decoder networks [1]? Useful for NLP beyond machine translation?

[1] https://arxiv.org/abs/1610.10099, https://github.com/paarthneekhara/byteNet-tensorflow

link

syllogism 3504 days ago

Thanks!

I don't understand those models very well yet. I haven't implemented one, or really sat down with the paper and really worked through it.

link

viksit 3505 days ago

One of the main issues with character level CNN's (irrespective of convolution type IIRC) is the inability of the model to handle unknown words, which is something that word level models do well. So if you look at applications of NLP in domains that need this to work well, you won't get much from purely char models in my experience.

link

thess24 3504 days ago

Looking forward to those posts! Just want to say I am a big fan of spaCy and your site looks great.

link

herrkanin 3505 days ago

If you want to learn more about it, I recommend you to read neural networks and deep learning: http://neuralnetworksanddeeplearning.com

link