Y
Hacker News
new
|
ask
|
show
|
jobs
by
acchow
1166 days ago
Can someone explain why computing a delta needs to hold the entire model at once? Can't it just do one layer at time?