Hacker News new | ask | show | jobs
by wsmoses 1957 days ago
You don't always need the input to compute the gradient. For example the gradient of a sum function doesn't require the original input, it just sets all of the derivative(input)'s to 1.
1 comments

To be more precise, in backwards mode auto-diff, inputs only need to be saved if they are used in a non-linear way.