Hacker News new | ask | show | jobs
by esafak 408 days ago
...because stochastic methods are implicit regularizers, leading to solutions that generalize better. Let's spell it out for those that don't know.

https://www.inference.vc/notes-on-the-origin-of-implicit-reg...

1 comments

OLS is a convex optimization problem, so this doesn't really apply. And for statistical analysis you really don't want to add poorly understood artificial noise to the parameter estimates anyway.
In general you do, because the unbiased estimates have higher generalization error. You are already dealing with sampling noise. I am not an expert in optimization, and what "poorly understood" means to you, but I know there is quite some research on the properties of SGD noise; e.g., https://francisbach.com/rethinking-sgd-noise/

Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning https://arxiv.org/abs/2301.13703