Y
Hacker News
new
|
ask
|
show
|
jobs
by
quadrature
251 days ago
I'm not very well versed, but i believe that training requires more memory to store intermediate computations so that you can calculate gradients for each layer.