|
|
|
|
|
by cs702
1312 days ago
|
|
Thank you. As you probably know, the big deal about double descent is that once sufficiently large AI models cross the so-called "interpolation threshold" in training, and get over the hump, they start generalizing better -- the opposite of overfitting. State-of-the-art performance in fact requires getting over the hump. As far as I can tell, you did not mention any of that explicitly anywhere in your post. Also, all your plots show only the classical overfitting curve, not the actual curve we now see all the time with larger AI models like Transformers. |
|
I believe the figure labeled "Figure 1" illustrates what your are suggesting (despite being labeled Figure 1, it is actually at the bottom of the blog post, so maybe easy to miss).