Hacker News new | ask | show | jobs
by erichocean 240 days ago
How are you mapping from "click this element" (presumably obtained via a VLM) to the actual DOM locator that refers to it?

I guess Playwright can do it in "record" mode; I'm curious how you do it from a Chrome extension.

Spitballing here, you inject an event filter on the page and when the click happens, grab the element and run some code to synthesize a selector that just refers to that element? (Presumably you could just reuse Playwright's element-to-locator code at this point.)

2 comments

So when you go into the "selector" mode, the plugin will add event listeners to all the DOM nodes. Based on your click it will try to generate a bunch of selectors statically first (multiple, css and xpath based), and then based on your guidance its the job of agent4 to make stable selectors.
document.elementFromPoint to get the elem at co-ordinates, then use npm package similar to optimal-select to come up with a unique css selector.