Hacker News new | ask | show | jobs
by GPUboy 654 days ago
I'm pretty sure we were the first to invent this concept of "semantic targets" based on our git commit logs. Essentially we 100% replaced previous selector strategies like xpath and CSS selectors in 2022, and we were the first startup approved by openAI to sell GPT-3 for automation august 2021. We've been working on this problem for 3 years and just open sourced some of our work. We're looking for contributors who could help with evals for generalized agents, and to push the unique state machine that appears to be state-of-the-art in forward solving open-ended problems.
1 comments

This sounds amazing! Can you explain how semantic targeting works?
Thanks! Sure:

Essentially there have been phases in automation from integration services, to browser automation, to RPA. The last phase, RPA(Robotic Process Automation from services like UIpath), used computer vision images to target elements to click or scrape. UIpath's innovation was using computer vision. Before that, browsers used code selectors in HTML like CSS selectors and Xpath. All of these solutions have a fatal flaw for automation: when popular services update their designs, you have to go back and re-build the automation and all targets.

We invented "Semantic Targets" in 2022 after trying to solve the end-to-end problem using just GPT-3. Semantic targets all targeting elements using english and reasoning, so you can build future-proof targets that still work when services update their designs. The other cool feature is you can add logical reasoning to these targets now. For example,

"Only scrape the funny tweets" or "Only scrape the tweets with the word Cheat Layer" or "If there is Cheat Layer in any tweet, say only 'yes'"

It took a year+ to build a multimodal model that calculated the probability each element matched the intent, but now modern models like Gemini can do this(GPT-4 can't target precise coordinates).

So if your target is "post button" even if twitter changes the color, moves, or even changes the word "post" the automation can still find it on the screen and click it.

We're pretty sure all automation tools will use this eventually in the future, since it seems like a no-brainer.

Here's more details:https://docs.cheatlayer.com/fundamentals/agentic-process-aut...