Hacker News new | ask | show | jobs
by bob1029 123 days ago
> I have baseline screenshots of how I want it to look, and instruct the agent to match against the screenshots

What if instead of feeding the actual and expected screenshots into the model we fed in a visual diff between the images along with a scalar quantity that indicates magnitude of difference? Then, an agent harness could quantify how close a certain run is and maybe step toward success autonomously.

That said, if you have the skills to produce the desired final design as a raster image, I'd argue you have already solved the hard part. Manually converting a high quality design into css is ~trivial with modern web.

1 comments

> What if instead of feeding the actual and expected screenshots into the model we fed in a visual diff between the images along with a scalar quantity that indicates magnitude of difference?

It does this by itself when needed, using imagemagick (in my case), also seen it create bounding boxes and measuring colors with impromptu opencv python scripts, so doesn't seem like it's needed to explicitly prompt for this, seems to do it when needed.

> Manually converting a high quality design into css is ~trivial with modern web.

Well, OP asked for "UI development" and not how the UI is first thought of, so figured I focus on the development part. How the UI is first created before the development is a different thing altogether, and current LLMs are absolutely awful at it, they seem to not even understand basics like visual hierarchy as far as I can tell.

I've really struggled with CC's "direction sense". I had a problem that was analogous to this. I had a picture of a PCB I wanted to figure out. So I instructed CC to create an overlay over each component and we would work through them to identify what they were to build an overall picture of what the device was doing. Any and all attempts to get CC to accurately place bounding boxes around components completely failed. What I ended up having to do was have CC create an interface where I could draw my own boxes around components, and it had no problem categorizing them and following along after that.

I've not tried to do any "pixel perfect" designs with CC outside of that. Generally I'm fine with the default UI it generates which tends to be some vague "modernish" sort of look.