Hacker News new | ask | show | jobs
by chatmasta 16 days ago
I’d like to see less focus on Playwright and more focus on giving the agent more than just an MCP to browser automation. Make it multi-modal, figure out how to optimize when to send screenshots to which model, etc… current coding harnesses are awful at any UI automation because they’re just automating DevTools and occasionally screenshotting. It’s obviously robotic, it’s slow, it’s ineffective and makes it difficult for the agent to validate success of code changes.

Generalized computer use is what will ultimately solve this, but I think there’s real intermediate value in optimizing browser workflows specifically, as a medley of remote browser automation and multi-modal browser use.