Hacker News new | ask | show | jobs
by zone411 795 days ago
I ran the same quick prompt adherence and composition test on which ImageFX by Google surpasses DALL-E 3 by a bit (https://www.astralcodexten.com/p/open-thread-315/comment/493...):

1. "A stained glass picture of a woman in a library with a raven on her shoulder with a key in its mouth"

2 out of 20 tries

2. "An oil painting of a man in a factory looking at a cat wearing a top hat"

First try

3. "A digital art picture of a child riding a llama with a bell on its tail through a desert"

0 out of 20 tries

4. "A 3D render of an astronaut in space holding a fox wearing lipstick"

2 out of 20 tries

5. "Pixel art of a farmer in a cathedral holding a red basketball"

First try

So about even with these models and much better than previous versions of SD. Better than Midjourney v6.

2 comments

You're essentially mixing units. "2 out of 20" does not match with "first try". I would have liked to see you run all of them for 20 and added comments in addition like "this got it right on the first try", which could also have been luck. I mean if it got 1 out of 20 but happened to get it right the first try, is that better or worse than 2 out of 20?
Like I said, it's a quick test, not a benchmark. The original question was about getting at least one of out 10 right (https://www.astralcodexten.com/p/a-guide-to-asking-robots-to...). Feel free to run them yourself, takes 5 minutes.
> "A 3D render of an astronaut in space holding a fox wearing lipstick"

... Who's supposed to be wearing the lipstick, the fox or the astronaut?