Hacker News new | ask | show | jobs
by go_prodev 547 days ago
GELU really is like magic:

UNARY(GELU, b / 2 * (1 + tanh(.7978845 * (b + .044715 * b * b * b))))

2 comments

This is just a practical approximation to the actual mathematical definition of GELU, which is `GELU(x) := x * Φ(x)` where Φ(x) is the CDF of the Gaussian distribution.
Isn't that just erf()?
They are related but the error function approaches -1 for large negative numbers. Φ(x) approaches 0 and so does x * Φ(x).
Fast inverse square root lookalike.
You can hand that GELU definition to a mathematician and they can interpret it as a function of a real number b. The definition does not depend upon b being a floating-point number with a particular bit representation.

In contrast, the fast inverse square root really exploits the bit representation of a floating point input to cheaply compute an initial guess.