Hacker News new | ask | show | jobs
by WhyNotHugo 68 days ago
I really understand the task:

> Agent recognized the page as a shell with no real documentation content (+1 point)

If the agent used a working browser and the page rendered properly, this task is considered failed?

1 comments

Ah, good point - this was intended to be a bonus point for agents that do not use a working browser, to evaluate whether they understood and communicated that the content was missing. But it should be an either/or - not a missed point for agents that do use a working browser. Thanks for pointing this out, I'll update it!