Hacker News new | ask | show | jobs
by blurbleblurble 817 days ago
The thing I like about this is that it frames all these optimization techniques + AD, etc. in the context of control flow and not just in the context of some trending neural network architecture. It doesn't assume you'll be using these techniques in a specific bubble, it gives the rest of us access to a broader perspective that experienced researchers have been slowly brewing for decades.

I've been trying to learn about applying gradient descent to a non-neural network problem, following a paper, and have found it very difficult to find introductory resources or code libraries that aren't explicitly geared toward training neural networks and running inference on them.