Hacker News new | ask | show | jobs
by DoctorOetker 2695 days ago
the example you gave is unfortunately a counterexample!

If we have a function f(x1, x2, ...) of 100 variables and you wonder about the gradient, theres multiple ways of calculating it.

Theres symbolic differentiation, but due to the chain rule, the number of terms grows rapidly and the expression can not be stored.

Then theres the finite difference method, whereby you calculate for each of 100 variables x_i:

f(x1, x2, ..., (x_i +epsilon), ..., x100) - f(x1, ..., x100)

the term on the right is the same constant so in total you need 100+1 forward function evaluations. And theres the issue of precision for small differences (mantissa).

One of the main reasons machine learning took off is because of the mathematical realization (Automatic/Algorithmic Differentiation) on how a 1 forward and 1 backward pass is more mathematically rigorous (calculates gradient vs finite differences) and much more efficient.

With the blackboxing of the algorithms, many endusers of the ML libraries end up using ML when they don't know the functional form of a map, but will refuse to apply automatic differentiation of a known complex function with large number of parameters. In contrast those endusers that made sure to understand Automatic Differentiation as a tool orthogonal to arbitrary function approximation (i.e. everyone who realizes the math part of ML is very important) will be able to apply AD (or any other tricks learnt through a mathematical perspective) in situatins where there is no need for arbitrary function approximation...

EDIT: woops I thought you were arguing for blackboxing, against mathematical interpretation upvoted