| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by eru 578 days ago
	Does anyone actually use the 'normal gradient descent' with the whole training set? I only ever see it as a sort of straw man to make explanation easier.

2 comments

jey 578 days ago

Generally yes, vanilla gradient descent gets plenty of use. But for LLMs: no, it’s not really used, and stochastic gradient descent provides a form of regularization, so it probably works better in addition to being more practical.

link

bravura 578 days ago

Full batch with L-BFGS, when possible, is wildly underappreciated.

link