Y
Hacker News
new
|
ask
|
show
|
jobs
by
Amber-chen
52 days ago
I like the small-surface-area approach. The question I’d use to evaluate this is how well the harness records/replays tool calls and failure modes, since that is where debugging agent behavior usually gets messy.