| HN Mirror

> to read this as though it is very obvious and commonly accepted fact

I'm not entirely sure what you're referring to by "this" but assuming you mean my comment, I think what I'm saying is very much up for debate and not an "obvious and commonly accepted fact". Karpathy has a very reasonably argument that directly disagrees with what I'm suggesting [0]. Of course he also agrees that in practice nobody will every use backprop directly.

Whether it's JAX, TF, PyTorch, etc the chain rule will be applied for you. I'm arguing that I think it's helpful to not have to worry about the details of how your derivative is being computed, and rather build an intuition about using derivatives as an abstraction. To be fair I think Karpathy is correct for people who are going to be learning to explicitly be experts in Neural Networks.

My point is more that given how powerful our tools today are for computing derivatives (I think JAX/Autograd have improved since Karpathy wrote that article), it's better to teach programmers to learn think of derivatives, gradients, hessians etc as high level abstractions. Worrying less about how to compute them and more about how to use them. In this way thinking about modeling doesn't need to be restricted to strictly NNs, but rather use NNs and example and then demonstrate to the student that they are free to build any model by defining how the model predicts, scoring the prediction and using the tools of calculus to answer other common questions you might have.

edit: a good analogy is logic programming and backtracking/unification. The entire point of logic programming is to abstract away backtracking. Sure experts in Prolog do need to understand backtracking, but it's more helpful to get beginners understanding how Prolog behaves than understand the details of backtracking.

[0] https://karpathy.medium.com/yes-you-should-understand-backpr...