Y
Hacker News
new
|
ask
|
show
|
jobs
by
SilverSlash
103 days ago
Interestingly, it actually regressed on Terminal Bench 2.0.
GPT-5.4: 75.1%
GPT-5.3-Codex: 77.3%