|
|
|
|
|
by microtonal
1150 days ago
|
|
A colleague and I were once discussing the fast inverse square root and joked that we need to make a (neural net) activation function that uses an inverse square root as an excuse to use the fast inverse square root. At any rate, I did come up with an activation function that is very similar to Swish/GELU but uses an inverse square root: https://twitter.com/danieldekok/status/1484898130441166853?s... It's quite a bit cheaper, because it doesn't need expensive elementary functions like exp or erf. (We did add it to Thinc: https://thinc.ai/docs/api-layers#dish) |
|