|
|
|
|
|
by jph00
3378 days ago
|
|
Sure - but on pytorch they suffer the kernel launch overhead each time through the loop, whereas on tensorflow and theano they do not. Which really impacts the kinds of algorithms that work well on each platform. Does that seem like a reasonable assessment to you? |
|
With TF's XLA compiler, they are slowly getting towards kernel fusion, which will then reduce launch overheads.
We have similar things in the works for pytorch: to quickly JIT at runtime the dynamic graph that is getting executed. More news on this will come when time-appropriate.