| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Michelangelo11 312 days ago

This is interesting but, speaking frankly, I see many seemingly insurmountable issues. Here are some:

- Contests will often be won not by the entry that best adhered to the prompt, but the best-looking one. This happened in the contest "Input Prompt Build a brutalist website to a typeface maker," which I got as a recent example. The winning entry had megawatt-bright magenta and yellow, which shouldn't appear anywhere near brutalism, and in other design aspects had almost no connection to brutalism either -- but it was the most attractive of the bunch.

- The approach only gets you to a local maximum. Current LLMs aren't very good designers, as you say, so contests will involve picking between mostly middling entries. You'd want a design that's, say, a 9 or a 10 on a 10-point scale -- but some 95% of the entry distribution will probably be between 5.5 and 7.5 or so, and that's what users will get to pick from.

2 comments

j_da 312 days ago

All great points. A limitation with human feedback is that once you start asking for more than binary preferences (e.g. multiple rankings or written feedback), the quality of the feedback does decrease. For instance, many times humans can give a quick answer on preference, but when asked "why" they prefer one thing over the other, they might not be able to full explain it in language. This in general is very much an open area of research on collecting and incorporating the most optimal types of feedback.

I definitely agree with your second point. One idea we're experimenting with is adding a human baseline, in which the models are benchmarked against human generated designs as well.

link

grace77 312 days ago

yes! to the second point, someone in our show HN proposed encouraging human designers to compete in submissions as well - we tried implementing this and found that, at least right now, LLMs are still so bad at design that asking a human to beat them is trivial - our plan right now is to focus more on this once it becomes more of challenges and therefore hopefully more interesting/entertaining

link