|
|
|
|
|
by taliesinb
2478 days ago
|
|
> It's hard to reformulate programs in a way s.t. the derivative can mean something meaningful. Really? The gradients computed by AD are the exact answer to the following question: if I were to change this input or parameter an infinitesimal amount, how much would it change the output of my function? That is always meaningful (when it is defined), and means what I just said. You can easily make functions where it is not defined, of course, just like you can make a sphere into two spheres with the Banach-Tarski theorem! But there are vast, vast forests of numerical computation employed in industry, science, finance, engineering, where it is almost always defined. And even for more “chunky” computations where the non-differentiability is more severe, there are algorithms like REINFORCE that you can use to estimate gradients through these parts. |
|
For example, take this code.
It's technically true that the gradient of 0 is correct (modulo boundaries). But if someone was trying to optimize this function, that's not very helpful.I believe REINFORCE is not of much help either - it's not magic. I'm not aware of any stochastic gradient estimators that are helpful in this case (although if there is a method I'd like to hear about it).