|
|
|
|
|
by dontwearitout
747 days ago
|
|
I haven't heard the term "shallow basin hypothesis" but I know what it refers to, these two papers spring to mind for me: 1) Loss Surfaces, Mode Connectivity, and Fast Ensembling of DNNs https://arxiv.org/abs/1802.10026 2) Visualizing the Loss Landscape of Neural Nets https://arxiv.org/abs/1712.09913 There's also a very interesting body of work on merging trained models, such as by interpolating between points in weight space, which relates to the concept of "basins" of similar solutions. Skim the intro of this if you're interested in learning more: https://arxiv.org/abs/2211.08403 |
|
Reviewing the literature, I see the concept is more commonly referred to as "flat/wide minima"; e.g., https://www.pnas.org/doi/10.1073/pnas.1908636117