Hacker News new | ask | show | jobs
by gronky_ 303 days ago
Keep in mind that this isn’t about users - the top agents on the leaderboard aren’t running an actual product on the benchmark.

If they are running their production product as is, then of course whatever is built into the product is fine.