| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jph00 3425 days ago
	Sure - but on pytorch they suffer the kernel launch overhead each time through the loop, whereas on tensorflow and theano they do not. Which really impacts the kinds of algorithms that work well on each platform. Does that seem like a reasonable assessment to you?

2 comments

smhx 3425 days ago

Currently not many frameworks have actual fusion of kernels (to avoid launching many GPU kernels). If you look underneath a theano.scan or TF.scan, GPU kernels are still being launched individually (but are likely stream-overlapped where appropriate).

With TF's XLA compiler, they are slowly getting towards kernel fusion, which will then reduce launch overheads.

We have similar things in the works for pytorch: to quickly JIT at runtime the dynamic graph that is getting executed. More news on this will come when time-appropriate.

link

whyrt12 3425 days ago

I WANT to use pytorch, but no bayesian learning or stochastic nodes like in edward. Any chance there are plans to for a compatibility layer with Edward or roll your own bayesian stuff?

Also, have you looked at Numba to do the jitting? Probably best not to have yet another separately maintained python JIT.

link

smhx 3425 days ago

as core-devs, we dont plan to build-in something like Edward. However, folks in the community are brewing something:

https://discuss.pytorch.org/t/bayesian-computation-in-pytorc... https://discuss.pytorch.org/t/distribution-implementations/4...

link

apaszke 3425 days ago

To not have the kernel launch overhead you'd need to stop launching GPU kernels but that's now how things work in any framework ;)

link