| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by albertzeyer 3310 days ago

CNTK uses the LSTM implementation by CuDNN in their official LSTM layer.

TensorFlow has multiple LSTM implementations, such as LSTMCell, BasicLSTMCell, LSTMBlockCell, and also one wrapper for CuDNN, and maybe more. I'm quite confident that in this benchmark, for TensorFlow, they did not use the CuDNN wrapper, which is a bit unfair I would say. Although the CuDNN wrapper in TensorFlow does not support sequences of different lengths but you could overcome this by just ignoring the non-used frames. See here for some more details:

https://stackoverflow.com/questions/41461670/cudnnrnnforward...

Note that you could also provide your own LSTM kernel for TensorFlow, which is what we do in our framework, and then you can get really fast, although our benchmarks are a bit outdated.

https://github.com/rwth-i6/returnn