Hacker News new | ask | show | jobs
by gooru 437 days ago
Thanks for reading and sharing your inputs.

1. To answer this, I will try to record a video this week. It's not a short answer because, if you’ve heard of Anthropic's computer-use, browser-use, or OpenAI's operator, this takes a slightly improved approach. It was demonstrated by Playwright MCP, which leverages the Accessibility Tree. In my testing, this worked well for a very clunky web app with many Shadow DOMs and iframes.

2. There is a built-in mechanism to deal with this. Once I hear more feedback on its performance, I can make improvements.

3. Right now, it does not use vision. The reason I didn't add vision in v1 is that I wanted to run tests with lower token costs and prove it works equally well without vision. I plan to add vision as an option for those who don’t mind the token cost.

*Summary*: This is a lightweight library, making it easy to adjust and improve.