Hacker News new | ask | show | jobs
by esafak 411 days ago
Is there a benchmark for this? If not, you ought to (crowd?)start one for everybody's sake.
1 comments

We started with using browser-use because they had the best evals: https://browser-use.com/posts/sota-technical-report

- but we found that Laminar came out with a better browser agent (& a better eval): https://www.lmnr.ai/ so we're looking to migrate over soon!