|
|
|
|
|
by mertunsall
362 days ago
|
|
In browser-use, we combine vision + browser extraction and we find that this gives the most reliable agent: https://github.com/browser-use/browser-use :) We recently gave the model access to a file system so that it never forgets what it's supposed to do - we already have ton of users very happy with recent reliability updates! We also have a beta workflow-use, which is basically what's mentioned in the comments here to "cache" a workflow: https://github.com/browser-use/workflow-use Let us know what you guys think - we are shipping hard and fast! |
|
browser-use is still strongly coupled to the DOM for interaction because of the set-of-marks approach it uses (for context - those little rainbow boxes you see around the elements). This means it’s very difficult to get it to reliably do interactions outside of straightforward click/type like drag and drop, interacting with canvas, etc.
Since we interact based purely on what we see on the screen using pixel coordinates, those sort of interactions are a lot more natural to us and perform much more reliably. If you don't believe me, I encourage you to try to get both Magnitude and browser-use to drag and drop cards on a Kanban board :)
Regardless, best of luck!