Also, a recent piece of interesting work [1] shows that with the right control parameters, you could still use gated RNNs, like LSTMs, for pretty good language modeling.
[1] http://www.abigailsee.com/2019/08/13/what-makes-a-good-conve...