Hacker News new | ask | show | jobs
by LarsDu88 708 days ago
I was wondering... this post mentions that ops like sigmoid are very slow.

A lot of modern LLMs use activation functions with sigmoid or soft max like SiLU, Swish, and SOLU.

Does Relu take less of a performance hit, and if so, maybe it'd be better to go back to good old relu?

1 comments

Relu is literally just a linear function that gets clamped to zero at some point, so yes, it's much less computationally intensive than anything involving an exponential function. But I doubt you would get competitive results using such a simple activation.