Y
Hacker News
new
|
ask
|
show
|
jobs
by
Palmik
206 days ago
All evals on Terminal Bench require some harness. :) Or "Agent", as Terminal Bench calls it. Presumably the Gemini 3 are using Gemini CLI.
What do you mean by "standard eval harness"?
1 comments
lucassz
206 days ago
I think the point is that it looks like Gemini 3 was only tested with the generic "Terminus 2", whereas Codex was tested with the Codex CLI.
link