|
|
|
|
|
by cs702
1312 days ago
|
|
> It's true that I don't go into detail about double descent, though I do describe how increasing capacity often reduces overfitting. I agree. > I believe the figure labeled "Figure 1" illustrates what your are suggesting (despite being labeled Figure 1, it is actually at the bottom of the blog post, so maybe easy to miss). Easy to miss, yes. I'm not sure it illustrates the phenomenon, though. That plot shows extreme overfitting (i.e., interpolation) by the 10,000 parameter model. No one really understands what actually happens after interpolation. There's in fact some anecdotal evidence that after crossing the interpolation threshold, large AI models trained with SGD gradually begin to ignore outliers and find simpler models (!) that generalize better (!). Counterintuitive, I know. This is an active area of research, with no good explanations yet, AFAIK. |
|