Hacker News new | ask | show | jobs
by mark_l_watson 2918 days ago
Nice article, and I agree with the explanations of what makes Keras and TensorFlow best for specific use cases. Some history: I have used TensorFlow for years, switched to coding against the Keras APIs about 8 months ago. I wish I had more experience with PyTorch, but I just have the time right now to do more than just play with it.

One suggestion to the authors: the benchmark figures are interesting, but I wish you had shown CPU only results also. At work, I have all the GPU resources I need but for my home projects, which are all NLP deep learning experiments, I usually rent a many core large memory server with no GPUs (GPUs seem to speed up RNNs less than other model types).

2 comments

For most applications you can probably use a TCN (temporal convolutional network) instead of LSTM. TCN's are implemented in all major frameworks and work an order of magnitude faster because they are parallel.

https://arxiv.org/abs/1803.01271

https://arxiv.org/abs/1608.08242

I've been using a QRNN in PyTorch, is this similar to a TCN?

https://github.com/salesforce/pytorch-qrnn

No, TCN is similar to WaveNet (dilated convolutions + masking the future + residual connections). It's a plain convnet, not an LSTM with a twist. That's why it runs efficiently in parallel on GPUs, like image processing convnets.
Actually, yes, the QRNN has all of those features.

First figure from our paper: how the LSTM with a twist allows for the equivalent speed of a plain convnet by running efficiently in parallel on GPUs, like image processing convents.[1]

Best of all, as it's only an "LSTM with (these) twists", it's drop-in compatible with existing LSTMs but can get you a 2-17 times speed-up over NVIDIA's cuDNN LSTM - essentially speed equivalent to the TCN or WaveNet speed-up.

That's why Baidu implemented QRNN in their production Deep Voice 2 neural text-to-speech (TTS) system[3].

This isn't to say TCN or QRNN is better, simply that it's dangerous to flat out say _no_ if you're not actually certain or don't correctly recall the underlying information.

Disclaimer: I'm the co-author of the QRNN.

Double disclaimer: The TCN paper cites the QRNN but decides not to test against it. They also show results over one of my datasets.

[1]: https://www.semanticscholar.org/paper/Quasi-Recurrent-Neural...

[2]: https://github.com/salesforce/pytorch-qrnn

[3]: https://www.semanticscholar.org/paper/Deep-Voice-2%3A-Multi-...

Haha, small world. I knew about QRNN and even tried to find an implementation once to test it on my data. Neat idea.
Thanks! I am trying TCNs this weekend.
Glad you like the article and thanks for suggesting researching the CPU usage across these frameworks - it's something worth looking into.