|
|
|
|
|
by unishark
2123 days ago
|
|
Interestingly (another empirical result that's poorly understood), with stochastic gradient descent, convergence often only requires one pass through the data, if not it might take a small number. And yes this field only exists because we are presuming a really large amount of data, which often can't even fit on the same hard drive. And a really complex model. Older kernel methods basically do what you want for tractable datasets. They can do very high-order polynomials, and also add the ability to regularize the solution various ways. Though again, I would be interested in seeing those methods compared to a simple least-squares fit as you propose, which people often didn't do even back when kernel methods were all the rage. |
|