|
|
|
|
|
by cromulen
3301 days ago
|
|
They refer to this benchmark in the blog post - http://dlbench.comp.hkbu.edu.hk/ There is also the v7 benchmark done on a lot more hardware and where tensorflow fares a bit better - http://dlbench.comp.hkbu.edu.hk/?v=v7 Does anyone know whether TF had a performance regression between v0.11 and v1.0 or if it was just lucky on benchmark v7 and unlucky on v8? Also, how does CNTK manage to be that much better than anyone else on LSTMs? It's ability to scale to bigger batch sizes is unreal. Order of magnitude faster than other frameworks. |
|
TensorFlow has multiple LSTM implementations, such as LSTMCell, BasicLSTMCell, LSTMBlockCell, and also one wrapper for CuDNN, and maybe more. I'm quite confident that in this benchmark, for TensorFlow, they did not use the CuDNN wrapper, which is a bit unfair I would say. Although the CuDNN wrapper in TensorFlow does not support sequences of different lengths but you could overcome this by just ignoring the non-used frames. See here for some more details:
https://stackoverflow.com/questions/41461670/cudnnrnnforward...
Note that you could also provide your own LSTM kernel for TensorFlow, which is what we do in our framework, and then you can get really fast, although our benchmarks are a bit outdated.
https://github.com/rwth-i6/returnn