| We're open sourcing Lumen — state of the art, vision-first browser agent. Problem Browser automation is fragile. Scripts break constantly and agents waste tokens getting stuck in loops. Today there are two options: Selector-based scripting like Playwright and Puppeteer: these require you to target specific DOM elements. First-generation browser agents (Stagehand, browser-use): use natural language interfaces but still resolve instructions into selectors under the hood. Selector-based scripts can break every time the UI changes. You end up maintaining selectors instead of building features. First-gen agents inherit the same brittleness, especially when they misidentify the right element. Solution: Lumen is vision-first. It sees the screen and acts like a human. Every natural language instruction resolves into an x,y coordinate on the screen. Three layers of stuck detection keep it on track and a dual-history system with context compaction lets it handle 20+ step workflows without blowing up the context window. We ran a WebVoyager eval (25 tasks across 15 sites, scored by LLM judge, 3 trials per task, all frameworks on Claude Sonnet 4.6): Lumen: 100% success rate, 77.8s avg time, ~104K tokens. browser-use: 100% success rate, 109.8s avg time. Stagehand: 76% success rate, 207.8s avg time, ~200K tokens. Lumen matches browser-use on accuracy while completing tasks ~30% faster, and beats Stagehand on every metric. Get Started Start using Lumen today: Docs: https://lumen.omlabs.xyz/ Support us:
GitHub star: https://github.com/omxyz/lumen |