|
|
|
|
|
by pcwelder
1043 days ago
|
|
Afaik weight decay is inspired from L2 regularisation which goes back to linear regression where L2 regularisation is equivalent to having gaussian prior on the weights with zero mean. Note that L1 regularisation produces much more sparsity but it doesn't perform as well. |
|
It's kind of amazing to watch this from the sidelines, a process of engineers getting ridiculously impressive results from some combo of sheer hackery and ingenuity, great data pipelining and engineering, extremely large datasets, extremely fast hardware, and computational methods that scale very well, but at the same time, gradually relearning lessons and re-inventing techniques that were perfected by statisticians over half a century ago.