|
|
|
|
|
by Michelangelo11
312 days ago
|
|
This is interesting but, speaking frankly, I see many seemingly insurmountable issues. Here are some: - Contests will often be won not by the entry that best adhered to the prompt, but the best-looking one. This happened in the contest "Input Prompt
Build a brutalist website to a typeface maker," which I got as a recent example. The winning entry had megawatt-bright magenta and yellow, which shouldn't appear anywhere near brutalism, and in other design aspects had almost no connection to brutalism either -- but it was the most attractive of the bunch. - The approach only gets you to a local maximum. Current LLMs aren't very good designers, as you say, so contests will involve picking between mostly middling entries. You'd want a design that's, say, a 9 or a 10 on a 10-point scale -- but some 95% of the entry distribution will probably be between 5.5 and 7.5 or so, and that's what users will get to pick from. |
|
I definitely agree with your second point. One idea we're experimenting with is adding a human baseline, in which the models are benchmarked against human generated designs as well.