Hacker News new | ask | show | jobs
by gbickford 551 days ago
Small models don't "know" as much so they hallucinate more. They are better suited for generations that are based in a ground truth, like in a RAG setup.

A better comparison might be Flash 2.0 vs 4o-mini. Even then, the models aren't meant to have vast world knowledge, so benchmarking them on that isn't a great indicator of how they would be used in real-world cases.

1 comments

Yes, it's not an apples to apples comparison. My point is the position it's at on the lmarena leaderboard is misplaced due to the hallucination issues.