|
|
|
|
|
by whiteandnerdy
452 days ago
|
|
You're correct, and the term you're looking for is "regularisation". There are two common ways of doing this:
* L1 or L2 regularisation: penalises models whose weight matrices are complex (in the sense of having lots of large elements)
* Dropout: train on random subsets of the neurons to force the model to rely on simple representations that are distributed robustly across its weights |
|
Trevor Hastie's Elements of Statistical Learning has a nice proof that (for linear models) L2 regularization is also semi-equivalent to dimensionality reduction, which you could use to motivate a "simplicity prior" idea in deep learning.
Yet another way of thinking about it, in the context of ReLU units, is that a layer of ReLUs forms a truncated hyper-plane basis (like splines but in higher dimensions) in feature space, and regularization induces smoothness in this N-dimensional basis by shrinking that basis towards being a flat hyper-plane