Hacker News new | ask | show | jobs
by SilverSlash 103 days ago
Interestingly, it actually regressed on Terminal Bench 2.0.

GPT-5.4: 75.1%

GPT-5.3-Codex: 77.3%