Hacker News new | ask | show | jobs
by calaphos 1108 days ago
For automatic differentiation (backpropagation) you need to store the intermediate results per layer of the forward pass. With checkpointing you can only store every nth layer and recompute the rest accordingly to reduce memory requirements in favor of more compute.
1 comments

What intermediate results you need to store?

For backpropagation you take the diff between actual and expected output and you go backwards to calculate derivate and apply it with optimiser - that's 8 extra bytes for single precision floats per trainable parameter.

Why do you need 80?