|
|
|
|
|
by gooru
437 days ago
|
|
Thanks for reading and sharing your inputs. 1. To answer this, I will try to record a video this week. It's not a short answer because, if you’ve heard of Anthropic's computer-use, browser-use, or OpenAI's operator, this takes a slightly improved approach. It was demonstrated by Playwright MCP, which leverages the Accessibility Tree. In my testing, this worked well for a very clunky web app with many Shadow DOMs and iframes. 2. There is a built-in mechanism to deal with this. Once I hear more feedback on its performance, I can make improvements. 3. Right now, it does not use vision. The reason I didn't add vision in v1 is that I wanted to run tests with lower token costs and prove it works equally well without vision. I plan to add vision as an option for those who don’t mind the token cost. *Summary*: This is a lightweight library, making it easy to adjust and improve. |
|