|
|
|
|
|
by embedding-shape
123 days ago
|
|
Best I did was having instructions for it to use webdriver + browser screenshots, then I have baseline screenshots of how I want it to look, and instruct the agent to match against the screenshots and continue until implementation is aligned with the screenshots. Typically I use Figma to create those screenshots, then let the agent go wild as long as it manages to get the right results. Once first implementation is done, then go through all the code and come up with a proper software design and architecture, and refactor everything to be proper code basically, again using the screenshot comparison to make sure there are no regressions. |
|
What if instead of feeding the actual and expected screenshots into the model we fed in a visual diff between the images along with a scalar quantity that indicates magnitude of difference? Then, an agent harness could quantify how close a certain run is and maybe step toward success autonomously.
That said, if you have the skills to produce the desired final design as a raster image, I'd argue you have already solved the hard part. Manually converting a high quality design into css is ~trivial with modern web.