Hacker News new | ask | show | jobs
by arjunchint 136 days ago
In our benchmark of web agents, we found that vision/GUI based agents get tripped up on popups/overlays, need large vision models and require using CDP in browsers.

Our own DOM-only web agent, rtrvr.ai, worked seamlessly underneath dialogs, can just use off the shelf Gemini Flash Lite and use Chrome native APIs leading to minimal infrastructure failures, SOTA performance and lowest cost.

https://www.rtrvr.ai/blog/web-bench-results