Hacker News new | ask | show | jobs
by abeppu 2478 days ago
Maybe this is the piece I don't understand. What do you mean by "more than an expression-only language"? What is a non-expression in a language? And what would it mean to have a derivative for your non-expression?
1 comments

You can easily use expressions to create trees rather than values in a library. Eg a + b need not compute a value, through a bit of operating overloading it can compute the tree plus(tree-a, tree-b).

This “trick” does not extend to statements, however. You can’t override if or semicolon in most languages. You can encode statements as expressions, but then you have to worry about things like variable bindings on your own.

Building expression trees is not the only way to do automatic differentiation. A simpler way is to just carry the value and it's derivative as a pair:

    #include <math.h>
    #include <stdio.h>

    struct autodiff { double value, deriv; };

    autodiff just(double x) { return { x, 1 }; }

    autodiff operator +(autodiff a, autodiff b) {
        return { a.value + b.value, a.deriv + b.deriv };
    }

    autodiff operator *(autodiff a, autodiff b) {
        return { a.value * b.value, a.deriv*b.value + a.value*b.deriv };
    }

    autodiff sin(autodiff a) {
        return { sin(a.value), cos(a.value)*a.deriv };
    }

    int main() {
        autodiff x = just(.1);
        for (int ii = 0; ii<4; ii++) {
            x = x*x + sin(x);
        }
        printf("value: %lf, deriv: %lf\n", x.value, x.deriv);
        return 0;
    } 
There is no need to differentiate the for loop or the semicolons. This way is not doing symbolic differentiation. It's implementing the differentiation rules in parallel to calculating the values at run time.

This generalizes to partial derivatives for multivariate functions too:

     template<int dims>
     struct autograd { double value, grad[dims}; };
This only works if you don't have x as a value determining the length of the for loop. If you try to take the derivative of x^x using a loop multiplying x times, you'll get a wrong answer (admittedly, expecting a right answer would be foolish).

But this goes to question at hand, whether AD should be a library/DSL or whether it should be a primitive of a general purpose language. The thing is a general purpose language lets you present sorts of things as functions that can't be even dual numbers won't take the derivative of correctly - a dual scheme can't distinguish variable order loops from constant order loops.

That's a fair point (and worth documenting as part of the library), but I'd be much more likely to implement pow(x, x) carefully than switch to a different language or compiler.
I appreciate the library approach and I basically agree with you. If anything, I wanted to point out that a normal general language is so general that adding differentiation into it would be highly costly - just this feature would require that every loop and every conditional return be watched (relative to pow(x,x), an even more challenging example is the Newton's method implementation of a continuous function using a loop).

An alternative would be creating a general purpose language just for this feature - an interesting though rather specialized project.

I'm not certain, but I suspect you'll hit the halting problem if you tried to be completely general purpose.