| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by philipbjorge 243 days ago

We had a similar realization here at Thoughtful and pivoted towards code generation approaches as well.

I know the authors of Skyvern are around here sometimes -- How do you think about code generation with vision based approaches to agentic browser use like OpenAI's Operator, Claude Computer Use and Magnitude?

From my POV, I think the vision based approaches are superior, but they are less amenable to codegen IMO.

2 comments

suchintan 242 days ago

Unrelated, but thoughtful gave us some very very helpful feedback early in our journey. We are big fans!

link

suchintan 242 days ago

I think they're complementary, and that's the direction we're headed.

We can ask the vision based models to output why they are doing what they are doing, and fallback to code-based approaches for subsequent runs

link