|
|
|
|
|
by nikcub
59 days ago
|
|
the most cited is terminal bench 2.0, but its also plagued by cheating accusations and benchmaxxing. somewhat remarkably, claude code ranks last for Opus 4.6 - which may say something about cc, or say something about the benchmark [0] https://www.tbench.ai/leaderboard/terminal-bench/2.0 |
|