|
|
|
|
|
by srean
2730 days ago
|
|
Its a small mercy that one does not find the global minimum of DNNs. Given their number of parameters and no explicit regularization term, they would overfit viciously. The inability to reach the global and the jiggling around imposed by stochastic gradient descent acts as implicit regularization |
|