Hacker News new | ask | show | jobs
by HDBaseT 3 days ago
Aren't benchmarks exactly that?

We used the AI to solve given problem with x% adherence/quality/correctness?