Hacker News new | ask | show | jobs
by timabdulla 435 days ago
How does it perform on e.g. WebVoyager, WebArena, or OSWorld? These seem to be the oft-cited benchmarks when comparing computer-use agents.