|
|
|
|
|
by jaschasd
1310 days ago
|
|
Blog post author here. A brief note that I do discuss the deep double descent phenomenon in the blog. See the section starting with "One of the best understood causes of extreme overfitting is that the expressivity of the model being trained too closely matches the complexity of the proxy task." I avoided using the actual term double descent, since I thought it would add unnecessary complexity. Lesson learned for next time -- I should have at least had an endnote using that terminology! |
|
As you probably know, the big deal about double descent is that once sufficiently large AI models cross the so-called "interpolation threshold" in training, and get over the hump, they start generalizing better -- the opposite of overfitting. State-of-the-art performance in fact requires getting over the hump. As far as I can tell, you did not mention any of that explicitly anywhere in your post.
Also, all your plots show only the classical overfitting curve, not the actual curve we now see all the time with larger AI models like Transformers.