|
|
|
|
|
by suchintan
517 days ago
|
|
I'd love to chat to see how we can help! Here's my email: suchintan@skyvern.com We're working on 2 major improvements that will get cost down at scale:
1. We're building a code generation layer under the hood that will start to memorize actions Skyvern has taken on a website, so repeated runs will be nearly free
2. We're exploring some graph re-ranking techniques to eliminate useless elements from the HTML DOM when analyzing the page. For example, if you're looking at the product page and want to add a product to cart, the likelihood you'll need to interact with the Reviews page will be 0. No need to send that context along to the LLM |
|
Computer vision is useful and very quick, however, it has been my experience parsing stacking context is much more useful. The problem is creating a stacking context when a news site embeds a youtube or blusky post. It requires injecting script into each using playwright. (Not mine, but, prior art [0]).
I've been quietly solving a problem I encountered creating browser agents that didn't have a solution 2 years ago in my free time. Most webpages are several independent global execution contexts and I'm developing a coherent way to get them all to speak with each other. [1]
> "Go to Amazon.com and add an iPhone 16, a screen protector, and a case to cart"
Are you familiar with Google Dialogflow? [2] It is a service which returns an object with intent and parameters which make it is to map to automation actions. I asked GhatGPT to help with an example of how Dialogflow might handle your request. [3]
[0] https://github.com/andreadev-it/stacking-contexts-inspector
[1] https://news.ycombinator.com/item?id=42576240
[2] https://cloud.google.com/dialogflow/es/docs/intents-overview
[3] https://chatgpt.com/share/678ae18d-5370-8004-97d4-f9949887b0...