| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by quadrature 302 days ago
	I'm not very well versed, but i believe that training requires more memory to store intermediate computations so that you can calculate gradients for each layer.