Y
Hacker News
new
|
ask
|
show
|
jobs
by
make3
3113 days ago
my understanding is that the issue is that the full Hessian of the loss is too expensive to compute at each step for the relative size of the increase in learning speed
1 comments
_0ffh
3113 days ago
Yeah I think that's why quasi-Newton methods like BFGS have been developed.
link