Hacker News new | ask | show | jobs
by vtail 107 days ago
Thanks - and no, I haven't seen this one. I like how they have the edit mode dashboard - show the original image + two edits; I was thinking about doing something like this.

I'm also a bit surprised they have gpt-image-1.5 so high above Nano Banana 2 - my limited testing shows that, at least for the visual styles, people like Nano Banana more.

1 comments

Yeah I think that it's part of the issue with a single "squashed" comparative metric. Some users are going to grade higher based on the overall visual fidelity and others are going to value following the prompt.

For a point of reference, I run a pretty comprehensive image model comparison site heavily weighted in favor of prompt adherence.

https://genai-showdown.specr.net

EDIT: FWIW, I agree with your assessment. OpenAI's models have always been very strong in prompt adherence but visually weak (gpt-image-1 had the famous "piss filter" until they finally pushed out gpt-image-1.5)

Very cool site - I think I saw it before here on HN, and I liked it a lot.

Did you manually review all the edit results manually yourself, or do you have some kind of automated procedure?

Thanks. So I have a bespoke python program that basically does this:

- Takes the platonic set of prompts

- Uses model specific tuning directives with LLMs to create a bunch of prompt variations so that they get a diverse set of natural language expressions to "roll" generations

But I still have to manually review each of the final image - which is pretty time-consuming. I've tried automating it using VLMs (like Qwen3-VL) but unfortunately they can miss the small details and didn't provide as much value as I was hoping.