Hacker News new | ask | show | jobs
by microtonal 1150 days ago
A colleague and I were once discussing the fast inverse square root and joked that we need to make a (neural net) activation function that uses an inverse square root as an excuse to use the fast inverse square root. At any rate, I did come up with an activation function that is very similar to Swish/GELU but uses an inverse square root:

https://twitter.com/danieldekok/status/1484898130441166853?s...

It's quite a bit cheaper, because it doesn't need expensive elementary functions like exp or erf.

(We did add it to Thinc: https://thinc.ai/docs/api-layers#dish)

1 comments

Haha I remember that the internal name for it was DoomSwish for a while a or something like that due to the fast inverse square root being often (I think wrongfully?) attributed to John Carmack. But it was used in Quake anyways right? XD