| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by agnosticmantis 779 days ago
	How GPU-friendly is this class of models?

1 comments

cloudhan 779 days ago

Very unfriendly.

The symbolic library (type of activations) requires a branching at the very core of the kernel. GPU will need to serialized on these operations warp-wise.

To optimize, you might want to do a scan operation beforehand and dispatch to activation funcs in a warp specialized way, this, however, makes the global memory read/write non-coalesced.

You then may sort the input based on type of activations and store it in that order, this makes the gmem IO coalesced but requires gather and scatter as pre and post processing.

link

jiggawatts 779 days ago

Wouldn't it be faster to calculate every function type and then just multiply them by 0s or 1s to keep the active ones?

link

samus 779 days ago

That's pretty much how branching on GPUs already works.

link

svantana 778 days ago

couldn't you implement these as a texture lookup, where x is the input and the various functions are stacked in y? That should be quite fast on gpus.

link