|
|
|
|
|
by qeternity
528 days ago
|
|
I don't really follow: a lot of the fragility of web automation comes from the programmatic vs. visual differences, which VLMs are able to overcome. Skipping the graphical rendering seems to be committing yourself to non-visual hell. The web isn't made for agents and automation. It's made for people. |
|
Solely depending on a VLM is indeed reminiscent of how humans interact with the web, but when a model thrives with more data, why restrict the data sent to the model?