| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by make3 3113 days ago
	my understanding is that the issue is that the full Hessian of the loss is too expensive to compute at each step for the relative size of the increase in learning speed

1 comments

Yeah I think that's why quasi-Newton methods like BFGS have been developed.