Hacker News new | ask | show | jobs
by iamwil 575 days ago
Lots of people can get impressive demos up and running, but if you want to run AI products in production, you're going to have to do system evals. System evals make sure your product is doing what it says on the box with unquantifiable qualities.

We wrote a zine on system evals without jargon: https://forestfriends.tech

Eugene Yan has written extensively on it https://eugeneyan.com/writing/evals/

Hamel has as well. https://hamel.dev/blog/posts/evals/