Hacker News new | ask | show | jobs
by alittletooraph2 440 days ago
I tried to make Studio Ghibli inspired images using presumably their new models. It was ass.
2 comments

Llama is not an image generating model. Any interface that uses Llama and generates images is calling out to a separate image generator as a tool, like OpenAI used to do with ChatGPT and DALL-E up until a couple of weeks ago: https://simonwillison.net/2023/Oct/26/add-a-walrus/
GPT 4o images is the future of all image gen.

Every other player: Black Forest Labs' Flux, Stability.ai's Stable Diffusion, and even closed models like Ideogram and Midjourney, are all on the path to extinction.

Image generation and editing must be multimodal. Full stop.

Google Imagen will probably be the first model to match the capabilities of 4o. I'm hoping one of the open weights labs or Chinese AI giants will release a model that demonstrates similar capabilities soon. That'll keep the race neck and neck.

One very important distinction between image models is the implementation: 4o is autogressive, slow, and extremely expensive.

Although the Ghibli trend is market validation, I suspect that competitors may not want to copy it just yet.

Extremely expensive in what since? In that it costs $.03 instead of $.00003c? Yeah it's relatively far more expensive than other solutions, but from an absolute standpoint still very cheap for the vast majority of use cases. And it's a LOT better.
Dall-E is already 4-8 cents per image. Afaik this is not in the API yet but I wouldn't be surprised if it's $1 or more.
> 4o is autogressive, slow, and extremely expensive.

If you factor in the amount of time wasted with prompting and inpainting, it's extremely well worth it.