Hacker News new | ask | show | jobs
by pbronez 73 days ago
Yup, agree. “Evaluations” = Tests

Gets pretty meta when you’re evaluating a model which needs to evaluate the output of another agent… gotta pin things down to ground truth somewhere.