| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by wsmoses 1963 days ago

Adding onto this, numerical derivatives have two potential problems which is why they tend not to be used in big scientific/ML frameworks.

First of all they suffer from accuracy decay. For example if you were to do the standard f'(x) \approx [f(x+h)-f(x)]/h, you'd subtract two similar numbers and waste many bits of precision. In contrast if you were to generate the derivative function directly like below, you'd end up far more accurate.

double square(double x) { return x * x; }

double d_square(double x) { return __enzyme_autodiff(square, x); }

becomes

double d_square(double x) { return 2 * x; }

Secondly, from a performance perspective numerical differentiation is really slow -- especially for gradient computation. For example you would need to evaluate the function at once per argument in numeric differentiation to get the whole gradient. In contrast, reverse mode AD lets you generate the entire gradient in one call.

In addition to these generic issues, we illustrate in our paper how doing this at a compiler level allows for significant additional optimization (by removing unnecessary code from the forward pass, finding common expressions, etc).

These issues also are amplified as you make higher-order derivatives and so on.