Hacker News new | ask | show | jobs
by delifue 661 days ago
> But now we get to use a key feature of infinitesimal changes: that they can always be thought of as just “adding linearly” (essentially because ε2 can always be ignored to ε). Or, in other words, we can summarize any infinitesimal change just by giving its “direction” in weight space

> a standard result from calculus gives us a vastly more efficient procedure that in effect “maximally reuses” parts of the computation that have already been done.

This partially explains why gradient descent becomes mainstream.