Hacker News new | ask | show | jobs
by nshm 3703 days ago
Strictly speaking if you split the parameter set on batches and iterate over batches optimizing each set of parameters with a gradient, it is not strictly a gradient decent, it is more a combination of coordinate decent (because you select the subset of coordinates to optimize first) and a gradient decent.
1 comments

Ah yes - that sounds like the stochastic gradient descent I've been hearing about. That makes a lot of sense for very expensive models. Thanks for the response nshm - I've recently taken an interest in ML (coming in with some familiarity with optimization), and it's much appreciated to have some 'REPL' in the learning process.