Hacker News new | ask | show | jobs
by DarkNova6 90 days ago
I'm never sure how much faith one can put into such benchmarks but in any case the optics seem to shift once you have pass@2 and pass@3.

Still, the more interesting comparison would be against something such as Codex.