|
|
|
|
|
by CompleteSkeptic
3175 days ago
|
|
An appropriate quote: "If you can't intelligently argue for both sides of an issue, you don't understand the issue well enough to argue for either." There are many people for whom the declarative paradigm is a huge plus. I would say there are at least 2 major approaches in running fast neural networks: 1. Figure out the common big components and make fast versions of those. 2. Figure out the common small components and how to make those run fast together. Different libraries have different strengths and weaknesses that match the abstraction level that they work at. For example, Caffe is the canonical example of approach 1, which makes writing new kinds of layers much harder than with other libraries, but makes connecting those layers quite easy as well as enabling new techniques that work layer-wise (such as new kinds of initialization). Approach 2 (TensorFlow's approach) introduces a lot of complexity, but it allows for different kinds of research. For example, because how you combine the low-level operations is decoupled from how those things are optimized together, you can more easily create efficient versions of new layers without resorting to native code. |
|
At least Tensorflow isn't at that level, because its "declarative" syntax is just yet another imperative language living on top of Python. But it still makes performance debugging really hard.
With PyTorch, I can just sprinkle torch.cuda.synchronize() liberally and the code will tell me exactly which CUDA kernel calls are consuming how much milliseconds. With Tensorflow, I have no idea why it is slow, or whether it can be any faster at all.