| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by easygenes 409 days ago
	OLS is a straightforward way to introduce GD, and although an analytic solution exists it becomes memory and IO bound at sufficient scale, so GD is still a practical option.

1 comments

jampekka 409 days ago

Computationally OLS is taking the pseudoinverse of the system matrix, which for dense systems has a complexity of O(samples * parameters^2). For some GD implementations the complexity of a single step is probably O(samples * parameters), so there could be a asymptotic benefit, but it's hard to imagine a case where the benefit is even realized, let alone makes a practical difference.

And in any case nobody uses GD for regressions for statistical analysis purposes. In practice Newton-Raphson or other more complicated schemes (with a lot higher computation, memory and IO demands) with a lot nicer convergence properties are used.

link

easygenes 409 days ago

Mini batch and streaming GD make the benefits obvious and trivial. Closed form OLS is unbeatable so long as samples * params^2 is comfortably sitting in memory. You often lose that as soon as your p approaches 10^5, which is common these days. Soon as you need distributed, streaming, or your data is too tall and or too wide then first order methods are the point of call.

link

jampekka 409 days ago

With batching it becomes SGD. If you're OK with approximations, you have e.g. randomized, reduced rank and streaming SVDs. And these tend have a lot nicer approximation and convergence properties than SGD.

What are the common cases for 10^5 parameter OLS? Perhaps something like weather models could include such computations?

link