Hacker News new | ask | show | jobs
by qcoh 3820 days ago
Symbolic differentiation is what you learned in school and do by hand. Automatic differentiation is, as far as I understand it, you take your nice function, develop it into a Taylor series around x:

    f(x + h) = f(x) + f'(x) h + f''(x) h^2 / 2 + ...
Suppose that f is a really, really nice function (defined and holomorphic in a large enough domain in the complex numbers and a Taylor expansion with real coefficients). The curious thing happens when you think of h as just an indeterminate and you evaluate it, for instance, at i s, where i =sqrt(-1) and s is a tiny number. Then

    f(x + is) = f(x) + f'(x) is + f''(x) (-s^2)/2 + ...
Just looking at the imaginary part gives you:

    Im(f(x+is))/s = f'(x) + O(s^2).
This means you can compute the derivative with an O(s^2) error just by evaluating at a complex number. The only thing needed was that all operations are implemented not just for real numbers but complex numbers. Instead of the R-algebra R[i] = C, why not try to use the R-algebra R[ε] and evaluate at ε, where ε is a symbol with ε^2 = 0? Remember, i is also just a symbol with the property i^2 = -1!!!

    f(x + ε) = f(x) + f'(x) ε + f''(x) ε^2/2 + ...
Since 0 = ε^2 = ε^3 = ..., all higher order terms vanish and you get

    f'(x) = coefficient of ε in f(x + ε).
(Compare with the complex numbers: Im = "coefficient of i" but this time around, there is no error)

Everything that you need is to know how to implement arithmetic operations for numbers in R[ε], e.g.

    (a+b ε) + (c + d ε) = (a+c) + (b+d) ε
    (a+b ε) (c + d ε) = ac + (ad + bc) ε + bd ε^2 = ac + (ad + bc) ε
A problem occurs when you try to divide by ε. This is simply undefined in R[ε], which, unlike C is not a field. But not all is lost, suppose you want to divide (a + b ε) by (c + d ε), where c is not 0, then

    (a+b ε) / (c + d ε) = (a + b ε) (c - d ε) / (c^2 + (-cd + cd) ε) = something with ε in the numerator / c^2,
That is, division by ε only occurs when you divide by 0 in the first place.

To recapitulate: If you have a class DualNumber with overloaded +, -, * , /, ^, you can find the derivative of, say, the fourth-order approximation of e^x at x = 0.5, by computing

    DualNumber x = DualNumber(0.5, 1);
    DualNumber approx = 1 + x + x^2/2 + x^3/6 + x^4/24;
    cout << approx.EpsCoeff() << endl;
(Sorry if this not a good explanation but I've spent too much time on it :P)