|
|
|
|
|
by timabdulla
514 days ago
|
|
OpenAI is merely matching SOTA in browser tasks as compared to existing browser-use agents. It is a big improvement over Claude Computer Use, but it is more of the same in the specific domain of browser tasks when comparing against browser-use agents (which can use the DOM, browser-specific APIs, and so on.) The truth is that while 87% on WebVoyager is impressive, most of the tasks are quite simple. I've played with some browse-use agents that are SOTA and they can still get very easily confused with more complex tasks or unfamiliar interfaces. You can see some of the examples in OpenAI's blog post. They need to quite carefully write the prompts in some instances to get the thing to work. The truth is that needing to iterate to get the prompt just right really negates a lot of the value of delegating a one-off task to an agent. |
|