| HN Mirror

Thanks! Sure:

Essentially there have been phases in automation from integration services, to browser automation, to RPA. The last phase, RPA(Robotic Process Automation from services like UIpath), used computer vision images to target elements to click or scrape. UIpath's innovation was using computer vision. Before that, browsers used code selectors in HTML like CSS selectors and Xpath. All of these solutions have a fatal flaw for automation: when popular services update their designs, you have to go back and re-build the automation and all targets.

We invented "Semantic Targets" in 2022 after trying to solve the end-to-end problem using just GPT-3. Semantic targets all targeting elements using english and reasoning, so you can build future-proof targets that still work when services update their designs. The other cool feature is you can add logical reasoning to these targets now. For example,

"Only scrape the funny tweets" or "Only scrape the tweets with the word Cheat Layer" or "If there is Cheat Layer in any tweet, say only 'yes'"

It took a year+ to build a multimodal model that calculated the probability each element matched the intent, but now modern models like Gemini can do this(GPT-4 can't target precise coordinates).

So if your target is "post button" even if twitter changes the color, moves, or even changes the word "post" the automation can still find it on the screen and click it.

We're pretty sure all automation tools will use this eventually in the future, since it seems like a no-brainer.

Here's more details:https://docs.cheatlayer.com/fundamentals/agentic-process-aut...