Hacker News new | ask | show | jobs
by echelon 440 days ago
GPT 4o images is the future of all image gen.

Every other player: Black Forest Labs' Flux, Stability.ai's Stable Diffusion, and even closed models like Ideogram and Midjourney, are all on the path to extinction.

Image generation and editing must be multimodal. Full stop.

Google Imagen will probably be the first model to match the capabilities of 4o. I'm hoping one of the open weights labs or Chinese AI giants will release a model that demonstrates similar capabilities soon. That'll keep the race neck and neck.

1 comments

One very important distinction between image models is the implementation: 4o is autogressive, slow, and extremely expensive.

Although the Ghibli trend is market validation, I suspect that competitors may not want to copy it just yet.

Extremely expensive in what since? In that it costs $.03 instead of $.00003c? Yeah it's relatively far more expensive than other solutions, but from an absolute standpoint still very cheap for the vast majority of use cases. And it's a LOT better.
Dall-E is already 4-8 cents per image. Afaik this is not in the API yet but I wouldn't be surprised if it's $1 or more.
> 4o is autogressive, slow, and extremely expensive.

If you factor in the amount of time wasted with prompting and inpainting, it's extremely well worth it.