|
|
|
Show HN: UI testing using multimodal LLMs
(kodefreeze.com)
|
|
1 points
by kodefreeze
153 days ago
|
|
Hi HN, I built this tool to solve the "flakiness" problem in UI testing. Existing AI agents often struggle with precise interactions, while traditional frameworks (Selenium/Playwright) break whenever the DOM changes. The Approach: Instead of relying on hard-coded selectors or pure computer vision, I’m using a multi-agent system powered by multimodal LLMs. We pass both the screenshot (pixels) and the browser context (network requests, console logs, etc) to the model. This allows the agent to: "See" the UI like a user and accurately map semantic intent ("Click the Signup button") to precise coordinates even if the layout shifts. The goal is to mimic natural user behavior rather than following a predefined script. It handles exploratory testing and finds visual bugs that code-based assertions miss. I’d love feedback on the implementation or to discuss the challenges of using LLMs for deterministic testing. |
|