Hacker News new | ask | show | jobs
by jprafael 1185 days ago
Computing gradients is easy/cheap. What this technique solves is that you no longer need to store the computed values of the gradient until the backpropagation phase, which saves on expensive GPU RAM, allowing you to use commodity hardware.