Hacker News new | ask | show | jobs
by esafak 408 days ago
In general you do, because the unbiased estimates have higher generalization error. You are already dealing with sampling noise. I am not an expert in optimization, and what "poorly understood" means to you, but I know there is quite some research on the properties of SGD noise; e.g., https://francisbach.com/rethinking-sgd-noise/

Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning https://arxiv.org/abs/2301.13703