Hacker News new | ask | show | jobs
by Imnimo 1625 days ago
Yeah, the autograd choice struck me as odd. Given how simple the model is, it feels like it would have been easy to show how to compute gradients. The whole benefit of having this super simple toy problem is that we can reason about the meaning of individual weights - it's a perfect opportunity to build clear intuition about gradients and weight updates. Switching to torch is just substituting one black box for another - to a novice reader, the torch code is just magical incantations.
1 comments

This could be the start of a breath-first approach, where you start with very little code, and then dig deep into things like autograd or "backprop" as you get interested in such details.

It seems to me that trying to give explicit formulas for gradients is just swamping the beginner with unnecessary details that don't help to build intuition. I think the author made exactly the right choices.

It used to be that some NN tutorials would swamp the beginner with backprop formulas, which beginners were forced by their professors to memorise. I don't think this succeeded at doing much; it only made the subject seem more complicated than it needed to be; and I think it should all be abstracted away into autograd.