| HN Mirror

Yeah, I don't mean to indicate that the model is bad. It's just that statistics are notoriously complicated (both in terms of the mathematics involved but also intuitively understanding the impact of non-perfect data and potential interactions is really hard), and most people really, really suck at it. Once you move from the maths to actually modelling data, you have to rely a lot on (niche) domain knowledge and experience.

I'm a bit biased because I work in a space with actual statisticians, but I'd wager that it's almost impossible for current LLMs to distinguish between good and bad examples in their training data. After all, even very smart humans fail to do that.