Hacker News new | ask | show | jobs
by srean 2730 days ago
Its a small mercy that one does not find the global minimum of DNNs. Given their number of parameters and no explicit regularization term, they would overfit viciously. The inability to reach the global and the jiggling around imposed by stochastic gradient descent acts as implicit regularization