Hacker News new | ask | show | jobs
by softwaredoug 97 days ago
Yeah "evals" might be a better word. Capturing all the stuff that proves your agent team builds the correct thing.