Hacker News new | ask | show | jobs
by nikcub 59 days ago
the most cited is terminal bench 2.0, but its also plagued by cheating accusations and benchmaxxing.

somewhat remarkably, claude code ranks last for Opus 4.6 - which may say something about cc, or say something about the benchmark

[0] https://www.tbench.ai/leaderboard/terminal-bench/2.0