For most applications you can probably use a TCN (temporal convolutional network) instead of LSTM. TCN's are implemented in all major frameworks and work an order of magnitude faster because they are parallel.
No, TCN is similar to WaveNet (dilated convolutions + masking the future + residual connections). It's a plain convnet, not an LSTM with a twist. That's why it runs efficiently in parallel on GPUs, like image processing convnets.
Actually, yes, the QRNN has all of those features.
First figure from our paper: how the LSTM with a twist allows for the equivalent speed of a plain convnet by running efficiently in parallel on GPUs, like image processing convents.[1]
Best of all, as it's only an "LSTM with (these) twists", it's drop-in compatible with existing LSTMs but can get you a 2-17 times speed-up over NVIDIA's cuDNN LSTM - essentially speed equivalent to the TCN or WaveNet speed-up.
That's why Baidu implemented QRNN in their production Deep Voice 2 neural text-to-speech (TTS) system[3].
This isn't to say TCN or QRNN is better, simply that it's dangerous to flat out say _no_ if you're not actually certain or don't correctly recall the underlying information.
Disclaimer: I'm the co-author of the QRNN.
Double disclaimer: The TCN paper cites the QRNN but decides not to test against it. They also show results over one of my datasets.
https://github.com/salesforce/pytorch-qrnn