Hacker News new | ask | show | jobs
by pverheggen 250 days ago
I feel like this could work if the selectors are chosen carefully to capture semantic meaning, rather than basing off of something arbitrary like a class name. The agent must have some understanding of the document to be able to perform those actions in the first place.

If it can find an ellipse tool, it's likely based off some combination of accessible role, accessible name, and inner text (perhaps the icon if it's multi-modal.) So in theory, couldn't it capture that criteria in a JS snippet and replay it?

1 comments

That's exactly what is it doing. The workflows are pretty much js snippets in itself you can see in the "code" tab (in the plugin when you select a saved workflow).