To add to this, there's a misleading phenomenon that first occurs where the performance actually gets worse with too much data/parameters/epochs, but oddly improves again if you throw even more at the model.
Is this the ML equivalent of Dunning–Kruger effect? A model with a bit of data is too afraid of being wrong to be overconfident. A model with a bit more data is overconfident in itself and gets things wrong. Finally, a model with tons and tons of data understands the complexity of the problem set and once again becomes too afraid of being wrong.
Model confidence as reported by softmax probability scores is notoriously noisy and miscalibrated. With larger models and more data the confidence estimation gets more nuanced.
https://openai.com/blog/deep-double-descent/
https://www.lesswrong.com/posts/FRv7ryoqtvSuqBxuT/understand...
(Edit: Oh, the definition appears in the abstract of the linked paper.)