Hacker News new | ask | show | jobs
by baron_harkonnen 1719 days ago
> chain rule is basic calculus that precedes understanding hessians.

It doesn't have to be that way. The hessian is an abstract idea and the chain rule and more specifically backpropagation are methods of computing the results for an abstract idea. When I want the hessian I want a matrix of second order partial derivatives, I'm not interested in how those are computed.

For a more concrete example, would you say that using the quantile function for the normal distribution requires you to be able to implement it from scratch?

There are many, very smart, very knowledgeable people that correctly use the normal quantile function (inverse CDF) every day for essential quantitative computation that have absolutely no idea how to implement the inverse error function (an essential part of the normal quantile). Would you say that you don't really know statistics if you can't do this? That a beginner must understand the implementation details of the inverse error function before making any claims about normal quantiles? I myself would absolutely need to pull up a copy of Numerical Recipes to do this. It would be, in my opinion, ludicrous to say that anyone wanting to write statistical code should understand and be able to implement the normal quantile function. Maybe in 1970 that was true, but we have software to abstract that out for us.

The same is becoming true of backprop. I can simply call jax.grad on my implementation of loss of the forward pass of the NN I'm interested in and get the gradient of that function, the same way I can call scipy.stats.norm.ppf to get that quantile for a normal. All that is important is that you understand what the quantile function of the normal distribution means for you to use it correctly, and again I suspect there are many practicing statisticians that don't know how to implement this.

And to give you a bit of context, my view on this has developed from working with many people who can pass a calculus exam and perform the necessarily steps to compute a derivative, but yet have almost no intuition about what a derivative means and how to use it and reason about it. Calculus historically focused on computation over intuition because that was what was needed to do practical work with calculus. Today the computation can take second place to the intuition because we have powerful tools that can take care of all the computation for you.

1 comments

> Today the computation can take second place to the intuition because we have powerful tools that can take care of all the computation for you.

and that tool is backprop. if you do not understand what the chain rule is and what it is doing, that tool will be magic to you and you are blindly trusting its correctness. seeing that alot of risk is involved in using AI models in real life, blindly trusting your model is not a good approach

i agree that simply regurgitating rules of calculus is pointless to understanding. but thats definitely not what i mean when i talk about the need to understand the chain rule

ML is a mathematically intensive subject. there is no going around this fact

do you know all the assemlber instructions your pc/mac carried out for you in order to post this text on hn? i guess not
but that's my point. knowing how to compile a program does not make me a compiler engineer. in that sense feel free to use ML tools, but don't be fooled into thinking you will get a job as an ML engineer if you do not know what the chain rule is, or why we need to take a derivative in order to optimise a loss function. in fact, don't even be fooled into thinking you will get into a ML uni degree if you don't know what the chain rule is. i actually don't understand what is the problem. spend 10 minutes reading up on it and i am sure you will get it. i think an unwarranted phobia of mathematics is what is at play here