| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by alittletooraph2 440 days ago
	I tried to make Studio Ghibli inspired images using presumably their new models. It was ass.

2 comments

simonw 440 days ago

Llama is not an image generating model. Any interface that uses Llama and generates images is calling out to a separate image generator as a tool, like OpenAI used to do with ChatGPT and DALL-E up until a couple of weeks ago: https://simonwillison.net/2023/Oct/26/add-a-walrus/

link

echelon 440 days ago

GPT 4o images is the future of all image gen.

Every other player: Black Forest Labs' Flux, Stability.ai's Stable Diffusion, and even closed models like Ideogram and Midjourney, are all on the path to extinction.

Image generation and editing must be multimodal. Full stop.

Google Imagen will probably be the first model to match the capabilities of 4o. I'm hoping one of the open weights labs or Chinese AI giants will release a model that demonstrates similar capabilities soon. That'll keep the race neck and neck.

link

minimaxir 440 days ago

One very important distinction between image models is the implementation: 4o is autogressive, slow, and extremely expensive.

Although the Ghibli trend is market validation, I suspect that competitors may not want to copy it just yet.

link

JamesBarney 440 days ago

Extremely expensive in what since? In that it costs $.03 instead of $.00003c? Yeah it's relatively far more expensive than other solutions, but from an absolute standpoint still very cheap for the vast majority of use cases. And it's a LOT better.

link

svachalek 440 days ago

Dall-E is already 4-8 cents per image. Afaik this is not in the API yet but I wouldn't be surprised if it's $1 or more.

link

echelon 440 days ago

> 4o is autogressive, slow, and extremely expensive.

If you factor in the amount of time wasted with prompting and inpainting, it's extremely well worth it.

link