Hacker News new | ask | show | jobs
by cromulen 3301 days ago
They refer to this benchmark in the blog post - http://dlbench.comp.hkbu.edu.hk/

There is also the v7 benchmark done on a lot more hardware and where tensorflow fares a bit better - http://dlbench.comp.hkbu.edu.hk/?v=v7

Does anyone know whether TF had a performance regression between v0.11 and v1.0 or if it was just lucky on benchmark v7 and unlucky on v8?

Also, how does CNTK manage to be that much better than anyone else on LSTMs? It's ability to scale to bigger batch sizes is unreal. Order of magnitude faster than other frameworks.

2 comments

CNTK uses the LSTM implementation by CuDNN in their official LSTM layer.

TensorFlow has multiple LSTM implementations, such as LSTMCell, BasicLSTMCell, LSTMBlockCell, and also one wrapper for CuDNN, and maybe more. I'm quite confident that in this benchmark, for TensorFlow, they did not use the CuDNN wrapper, which is a bit unfair I would say. Although the CuDNN wrapper in TensorFlow does not support sequences of different lengths but you could overcome this by just ignoring the non-used frames. See here for some more details:

https://stackoverflow.com/questions/41461670/cudnnrnnforward...

Note that you could also provide your own LSTM kernel for TensorFlow, which is what we do in our framework, and then you can get really fast, although our benchmarks are a bit outdated.

https://github.com/rwth-i6/returnn

CNTKs roots are from Speech type data which inherently have a notion of time. The architecture of the toolkit support efficient recurrence from ground up. Also the toolkit focusses on handling large production scale data workloads which implies additional engineering efficiencies built into the toolkit. I am a Microsoft employee.