Hacker News new | ask | show | jobs
by medo-bear 1719 days ago
Backpropagation is a particular implementation of reverse mode auto-differentiation, and it is the basis for all implementaions of DL models. It is very strange for me to read this as though it is very obvious and commonly accepted fact, which I don't think it is.
1 comments

> to read this as though it is very obvious and commonly accepted fact

I'm not entirely sure what you're referring to by "this" but assuming you mean my comment, I think what I'm saying is very much up for debate and not an "obvious and commonly accepted fact". Karpathy has a very reasonably argument that directly disagrees with what I'm suggesting [0]. Of course he also agrees that in practice nobody will every use backprop directly.

Whether it's JAX, TF, PyTorch, etc the chain rule will be applied for you. I'm arguing that I think it's helpful to not have to worry about the details of how your derivative is being computed, and rather build an intuition about using derivatives as an abstraction. To be fair I think Karpathy is correct for people who are going to be learning to explicitly be experts in Neural Networks.

My point is more that given how powerful our tools today are for computing derivatives (I think JAX/Autograd have improved since Karpathy wrote that article), it's better to teach programmers to learn think of derivatives, gradients, hessians etc as high level abstractions. Worrying less about how to compute them and more about how to use them. In this way thinking about modeling doesn't need to be restricted to strictly NNs, but rather use NNs and example and then demonstrate to the student that they are free to build any model by defining how the model predicts, scoring the prediction and using the tools of calculus to answer other common questions you might have.

edit: a good analogy is logic programming and backtracking/unification. The entire point of logic programming is to abstract away backtracking. Sure experts in Prolog do need to understand backtracking, but it's more helpful to get beginners understanding how Prolog behaves than understand the details of backtracking.

[0] https://karpathy.medium.com/yes-you-should-understand-backpr...

but with backprop you do not worry about computing derivatives by hand. backprop and AD in general means you do not have to do that. maybe one of us is misunderstanding the other

i am saying that if you want to work with ML algorithms on a more deeper level you must learn backprop

if you want to implement some models on the other hand, you can just follow a recipe approach