Hacker News new | ask | show | jobs
by Salgat 1731 days ago
To add to this, there's a misleading phenomenon that first occurs where the performance actually gets worse with too much data/parameters/epochs, but oddly improves again if you throw even more at the model.
2 comments

For the interested, this phenomenon is known as (deep) double descent:

https://openai.com/blog/deep-double-descent/

https://www.lesswrong.com/posts/FRv7ryoqtvSuqBxuT/understand...

(Edit: Oh, the definition appears in the abstract of the linked paper.)

Is this the ML equivalent of Dunning–Kruger effect? A model with a bit of data is too afraid of being wrong to be overconfident. A model with a bit more data is overconfident in itself and gets things wrong. Finally, a model with tons and tons of data understands the complexity of the problem set and once again becomes too afraid of being wrong.
Model confidence as reported by softmax probability scores is notoriously noisy and miscalibrated. With larger models and more data the confidence estimation gets more nuanced.