|
|
|
|
|
by untrimmed
255 days ago
|
|
So all the classic optimization theory about staying in the stable region is basically what deep learning doesn't do. The model literally learns by becoming unstable, oscillating, and then using that energy to self-correct. The chaos is the point. What a crazy, beautiful mess. |
|
Researchers are constantly looking to train more expressive models more quickly. Any method which can converge + take large jumps will be chosen. You are sort of guaranteed to end up in a place where the sharpness is high but it somehow comes under control. If we weren't there.... we'd try a new architecture until we arrived there. So deep learning doesn't "do this", we do it using any method possible and it happened to be the various architectures that currently fit into "deep learning". Keep in mind many architectures which are deep do not converge - you see survivorship bias.