Y
Hacker News
new
|
ask
|
show
|
jobs
by
sanderjd
1 day ago
Aren't there benchmarks that measure at the harness level as well?
1 comments
theshrike79
1 day ago
How would you benchmark "agent harness communicates with user clearly" it's 100% a feels measurement.
link
sanderjd
1 day ago
I mean, in my experience some of this stuff is way closer to table stakes things than that. Like "the tool call didn't get totally confused" more than "did the communication with the user feel good".
link