Hacker News new | ask | show | jobs
by taliesinb 2478 days ago
> It's hard to reformulate programs in a way s.t. the derivative can mean something meaningful.

Really? The gradients computed by AD are the exact answer to the following question: if I were to change this input or parameter an infinitesimal amount, how much would it change the output of my function? That is always meaningful (when it is defined), and means what I just said. You can easily make functions where it is not defined, of course, just like you can make a sphere into two spheres with the Banach-Tarski theorem!

But there are vast, vast forests of numerical computation employed in industry, science, finance, engineering, where it is almost always defined.

And even for more “chunky” computations where the non-differentiability is more severe, there are algorithms like REINFORCE that you can use to estimate gradients through these parts.

1 comments

It's not meaningful in the sense that it won't correspond with your intuition of "will increasing the input increase my output". Presumably the point of differentiable programming isn't just getting the derivative for fun, it's for optimizing some quantity.

For example, take this code.

     x,y
     for (int i=0; i<x; i++)
         y += 1
     return y
It's technically true that the gradient of 0 is correct (modulo boundaries). But if someone was trying to optimize this function, that's not very helpful.

I believe REINFORCE is not of much help either - it's not magic. I'm not aware of any stochastic gradient estimators that are helpful in this case (although if there is a method I'd like to hear about it).