Hacker News new | ask | show | jobs
by jampekka 412 days ago
With batching it becomes SGD. If you're OK with approximations, you have e.g. randomized, reduced rank and streaming SVDs. And these tend have a lot nicer approximation and convergence properties than SGD.

What are the common cases for 10^5 parameter OLS? Perhaps something like weather models could include such computations?