Hacker News new | ask | show | jobs
by syllogism 3505 days ago
(Author here)

Thanks! I'm planning to make two follow up posts, on each of the systems, that go through those details. I blurred them out in this post because I wanted to get across this more abstract story about the data types and transformations.

There are lots of good posts about attention mechanisms. The WildML post is good, as is Chris Olah's post. Bidirectional RNNs are a little bit less well covered, but the idea is not too difficult to understand given a single RNN (or LSTM, GRU etc).

You should also read the papers :). This is how most people who are doing ML --- including the people building practical things, not researchers --- are staying up to date. Academia is so competitive and writing is cheap relative to experimentation. The deep learning literature is really pretty easy to follow.

2 comments

Really like what you're doing with SpaCy and explosionAI, good stuff :)

What do you think about dilated convolutional encoder/decoder networks [1]? Useful for NLP beyond machine translation?

[1] https://arxiv.org/abs/1610.10099, https://github.com/paarthneekhara/byteNet-tensorflow

Thanks!

I don't understand those models very well yet. I haven't implemented one, or really sat down with the paper and really worked through it.

One of the main issues with character level CNN's (irrespective of convolution type IIRC) is the inability of the model to handle unknown words, which is something that word level models do well. So if you look at applications of NLP in domains that need this to work well, you won't get much from purely char models in my experience.
Looking forward to those posts! Just want to say I am a big fan of spaCy and your site looks great.