| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by embedding-shape 123 days ago
	Best I did was having instructions for it to use webdriver + browser screenshots, then I have baseline screenshots of how I want it to look, and instruct the agent to match against the screenshots and continue until implementation is aligned with the screenshots. Typically I use Figma to create those screenshots, then let the agent go wild as long as it manages to get the right results. Once first implementation is done, then go through all the code and come up with a proper software design and architecture, and refactor everything to be proper code basically, again using the screenshot comparison to make sure there are no regressions.

1 comments

bob1029 123 days ago

> I have baseline screenshots of how I want it to look, and instruct the agent to match against the screenshots

What if instead of feeding the actual and expected screenshots into the model we fed in a visual diff between the images along with a scalar quantity that indicates magnitude of difference? Then, an agent harness could quantify how close a certain run is and maybe step toward success autonomously.

That said, if you have the skills to produce the desired final design as a raster image, I'd argue you have already solved the hard part. Manually converting a high quality design into css is ~trivial with modern web.

link

embedding-shape 123 days ago

> What if instead of feeding the actual and expected screenshots into the model we fed in a visual diff between the images along with a scalar quantity that indicates magnitude of difference?

It does this by itself when needed, using imagemagick (in my case), also seen it create bounding boxes and measuring colors with impromptu opencv python scripts, so doesn't seem like it's needed to explicitly prompt for this, seems to do it when needed.

> Manually converting a high quality design into css is ~trivial with modern web.

Well, OP asked for "UI development" and not how the UI is first thought of, so figured I focus on the development part. How the UI is first created before the development is a different thing altogether, and current LLMs are absolutely awful at it, they seem to not even understand basics like visual hierarchy as far as I can tell.

link

tstrimple 123 days ago

I've really struggled with CC's "direction sense". I had a problem that was analogous to this. I had a picture of a PCB I wanted to figure out. So I instructed CC to create an overlay over each component and we would work through them to identify what they were to build an overall picture of what the device was doing. Any and all attempts to get CC to accurately place bounding boxes around components completely failed. What I ended up having to do was have CC create an interface where I could draw my own boxes around components, and it had no problem categorizing them and following along after that.

I've not tried to do any "pixel perfect" designs with CC outside of that. Generally I'm fine with the default UI it generates which tends to be some vague "modernish" sort of look.

link