|
|
|
|
|
by medo-bear
1719 days ago
|
|
i am not sure what you are disagreeing with. chain rule is basic calculus that precedes understanding hessians. my argument is, if you can not understand what the chain rule is, you will not understand more complicated mathematics in ML. do you think i am wrong ? EDIT: also uncertainty estimation is the stuff of probabalistic approach to ML. i would say that people who do probabalistic ML are quite mathematically capable (at least to my experience) |
|
It doesn't have to be that way. The hessian is an abstract idea and the chain rule and more specifically backpropagation are methods of computing the results for an abstract idea. When I want the hessian I want a matrix of second order partial derivatives, I'm not interested in how those are computed.
For a more concrete example, would you say that using the quantile function for the normal distribution requires you to be able to implement it from scratch?
There are many, very smart, very knowledgeable people that correctly use the normal quantile function (inverse CDF) every day for essential quantitative computation that have absolutely no idea how to implement the inverse error function (an essential part of the normal quantile). Would you say that you don't really know statistics if you can't do this? That a beginner must understand the implementation details of the inverse error function before making any claims about normal quantiles? I myself would absolutely need to pull up a copy of Numerical Recipes to do this. It would be, in my opinion, ludicrous to say that anyone wanting to write statistical code should understand and be able to implement the normal quantile function. Maybe in 1970 that was true, but we have software to abstract that out for us.
The same is becoming true of backprop. I can simply call jax.grad on my implementation of loss of the forward pass of the NN I'm interested in and get the gradient of that function, the same way I can call scipy.stats.norm.ppf to get that quantile for a normal. All that is important is that you understand what the quantile function of the normal distribution means for you to use it correctly, and again I suspect there are many practicing statisticians that don't know how to implement this.
And to give you a bit of context, my view on this has developed from working with many people who can pass a calculus exam and perform the necessarily steps to compute a derivative, but yet have almost no intuition about what a derivative means and how to use it and reason about it. Calculus historically focused on computation over intuition because that was what was needed to do practical work with calculus. Today the computation can take second place to the intuition because we have powerful tools that can take care of all the computation for you.