| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nshm 3703 days ago
	Strictly speaking if you split the parameter set on batches and iterate over batches optimizing each set of parameters with a gradient, it is not strictly a gradient decent, it is more a combination of coordinate decent (because you select the subset of coordinates to optimize first) and a gradient decent.

1 comments

ctandre 3703 days ago

Ah yes - that sounds like the stochastic gradient descent I've been hearing about. That makes a lot of sense for very expensive models. Thanks for the response nshm - I've recently taken an interest in ML (coming in with some familiarity with optimization), and it's much appreciated to have some 'REPL' in the learning process.

link