| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by LarsDu88 708 days ago

I was wondering... this post mentions that ops like sigmoid are very slow.

A lot of modern LLMs use activation functions with sigmoid or soft max like SiLU, Swish, and SOLU.

Does Relu take less of a performance hit, and if so, maybe it'd be better to go back to good old relu?

1 comments

eigenvalue 707 days ago

Relu is literally just a linear function that gets clamped to zero at some point, so yes, it's much less computationally intensive than anything involving an exponential function. But I doubt you would get competitive results using such a simple activation.

link