Hacker News new | ask | show | jobs
by philipbjorge 243 days ago
We had a similar realization here at Thoughtful and pivoted towards code generation approaches as well.

I know the authors of Skyvern are around here sometimes -- How do you think about code generation with vision based approaches to agentic browser use like OpenAI's Operator, Claude Computer Use and Magnitude?

From my POV, I think the vision based approaches are superior, but they are less amenable to codegen IMO.

2 comments

Unrelated, but thoughtful gave us some very very helpful feedback early in our journey. We are big fans!
I think they're complementary, and that's the direction we're headed.

We can ask the vision based models to output why they are doing what they are doing, and fallback to code-based approaches for subsequent runs