|
|
|
|
|
by LarsDu88
708 days ago
|
|
I was wondering... this post mentions that ops like sigmoid are very slow. A lot of modern LLMs use activation functions with sigmoid or soft max like SiLU, Swish, and SOLU. Does Relu take less of a performance hit, and if so, maybe it'd be better to go back to good old relu? |
|