| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by stu2b50 1185 days ago
	I mean the derivative of a constant is 0. So if all of the original weights are considered constants, then computing their gradients is trivial, since they’re just zero.

1 comments

jprafael 1185 days ago

Computing gradients is easy/cheap. What this technique solves is that you no longer need to store the computed values of the gradient until the backpropagation phase, which saves on expensive GPU RAM, allowing you to use commodity hardware.

link