Hacker News new | ask | show | jobs
by baalimago 2487 days ago
>Next we shall take a moment to remember the fallen heros, without whom we would not be where we are today. I am, of course, referring to the RNNs - Recurrent Neural Networks, a concept that became almost synonymous with NLP in the deep learning field.

XLNet (https://arxiv.org/abs/1906.08237) is in essence a recurrent neural network, using a transformer (which is based on neural networks) which recurrently keeps context between different batches. But the gated RNN's, such as AWD-LSTM/GRU, are fading out to the superior transformer architectures, this is true.

That's my only complain though, excellent theoretical introduction.

Although, if anyone wanted to actually implement a transformer, be ware that you want to have a 8+ GB GPU unit available, or be prepared to use cloud computing (Google Colab is free, for now). Training neural networks is quite hardware dependent still.

3 comments

Scaleway (where the author of this post works, as I do) is a cloud service provider with a pretty interesting GPU instance: Nvidia P100 16-GB NVIDIA Tesla P100 at 1€ per hour
RNNs are still useful in actual time-dependent sequences like activity detection, self-driving car steering etc. though even those are getting enhanced by using attention; use of RNNs in NLP was more of a necessity as there were no other Deep Learning models capable of delivering some results on arguably sequential nature of NLP (let's say that is a quite imperfect assumption). As attention allows viewing the whole input at once, it's easier for non-linear optimizer to set meaningful weights without getting into recursion, though that comes at massive memory cost (i.e. forget about using 2080Ti for NLP).
I was going to mention XLNet before I saw your comment.

Also, a recent piece of interesting work [1] shows that with the right control parameters, you could still use gated RNNs, like LSTMs, for pretty good language modeling.

[1] http://www.abigailsee.com/2019/08/13/what-makes-a-good-conve...