Hacker News new | ask | show | jobs
by unchar1 2 days ago
It's not just figuring out if a model is good at things, but is it good at the things I care about.

Using a targeted eval suite (like a test suite) tells us that.