Hacker News new | ask | show | jobs
by p0deje 483 days ago
Have you experimented with using text-only models and DOM/accessibility tree for interaction with a ? I'm currently working on the open-source test automation tool (https://alumnium.ai) and the accessibility tree w/o screenshots works pretty well as long as the website provides decent support for ARIA attributes or at least has proper HTML5 structure.
1 comments

On most pages, we don't need vision, and the DOM alone is sufficient. We have not worked with the accessibility tree yet, but it's a great idea to include that. Do you have any great resources on where to get started?
> On most pages, we don't need vision, and the DOM alone is sufficient.

I misunderstood looking at demo videos, it seemed like you constantly update elements with borders/IDs so I assumed that's what is then passed to vision.

> Do you have any great resources on where to get started?

A great place to start is https://chromium.googlesource.com/chromium/src/+/main/docs/a....