Hacker News new | ask | show | jobs
by agnosticmantis 779 days ago
How GPU-friendly is this class of models?
1 comments

Very unfriendly.

The symbolic library (type of activations) requires a branching at the very core of the kernel. GPU will need to serialized on these operations warp-wise.

To optimize, you might want to do a scan operation beforehand and dispatch to activation funcs in a warp specialized way, this, however, makes the global memory read/write non-coalesced.

You then may sort the input based on type of activations and store it in that order, this makes the gmem IO coalesced but requires gather and scatter as pre and post processing.

Wouldn't it be faster to calculate every function type and then just multiply them by 0s or 1s to keep the active ones?
That's pretty much how branching on GPUs already works.
couldn't you implement these as a texture lookup, where x is the input and the various functions are stacked in y? That should be quite fast on gpus.