Hacker News new | ask | show | jobs
by tripzilch 51 days ago
IMHO looks more like a stork, not a pelican. Look up any image of an actual pelican and check the ratio of legs to body. IMHO that's a weird mistake to make when asked for a "pelican".

Have you considered asking a couple of artists on Fiverr or something to draw you a picture with the same prompt? I don't mean this as a gotcha, it's actual advice, you should probably get a sense of what a real human artist/designer (or three) would do with this prompt.

For example, I hope you will find that: One reasoning choice is wrong with this picture that's not much to do with its ability to draw. Do we enlarge the pelican to human size? Or do we shrink the bike to pelican size? There is only one answer that keeps pelican proportions. Draw a pelican on a very tiny bike, and its legs will just fit without making it a different species, and you can even sort of cover part of the steer under the wings, etc etc.

I'm curious if other artists would come up with the same or other solutions, but they should in general come up with solutions, which I haven't seen the LLM do, really.

You (or maybe others?) said that the "pelican on a bike" prompt is good because "there is no right answer" cause you can't really fit a pelican on a bike. But most artists will say "hold my beer" and figure it out anyway. Cartoonists won't even have to think. The "figuring out" of these problems is what I'm missing in the LLMs response. It just put a pelican on a bike and makes it look like a stork if necessary. I don't really feel like it's actually testing for the thing this prompt is designed for, unless the test still says "FAIL" for each and all of them, including the one you just called "excellent".

1 comments

Honestly it never crossed my mind to waste some artist's time with this, but now that the joke "benchmark" has somehow reached orbital velocity maybe I should be thinking about it!

I've run the prompt through dozens of dedicated image generation models so I've seen many versions of this that are better attempts than a text model spitting out SVG - here's gpt-image-2 as a recent example: https://chatgpt.com/share/69ea21ab-8738-83e8-a4d7-67374d84e0...

I believe that if you pay them for their time, it's not really "wasted", at least not nearly as "wasted" as when the next person would pay them to design some vapid advertisement.

In addition to that 1) it's for science and 2) maybe you owe it to yourself to have a really nice framed picture of a pelican riding a bicycle on the wall :D

About the dedicated image generation results, I still would have made the bicycle smaller, but it starts to depend on how motivated the artist is to make both the bike and pelican accurate. Which is fine, but if you want to have a benchmark, it's important to have at least one "known good" example, I think.