| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by snowfield 235 days ago

I'd assume that behind the scenes the models generate several passes and only show the user the best one, that would be smart, as to to make it seem their model is better than others

Is also pretty obvious that the models have some built in prompt system rules that makes the final output a certain style. They seem very consistent

It also looks like 40 has the temperature turned way down, to ensure max adherence, while midjourney etc seem to have higher temperature.more interesting end results, flourishing, complex Materials and backgrounds

Also what's with 4o's sepia tones. Post editing in the gen workflows?

I don't believe any of these just generate the image though, there's likely several steps in each workflows to present the final images outputted to the user in the absolute best light.

2 comments

simonw 235 days ago

You can run some image models locally if you want to prove to yourself how well they can do with just a single generation from a prompt with no extra steps.

I've done this enough to suspect that most hosted image models don't increase their running costs to try and get better results through additional passes without letting the user know what they are doing.

Many of the LLM-driven models do implement a form of prompt rewriting though (since effectively prompting image models is really hard) - some notes on how DALL-E 3 did that here: https://simonwillison.net/2023/Oct/26/add-a-walrus/

phi-go 235 days ago

There are numbers on how many tries it took. I would also find the individual prompts and images interesting.