| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by embedding-shape 35 days ago
	> Also, GPT-Image-2 is not a diffusion model, it is based on Transformers, like other LLMs are. Where are you getting this from btw? AFAIK, OpenAI hasn't openly talked about what exactly is powering the Images 2.0 stuff, unless I missed something? I think they've said it's not a diffusion model, but I'm not sure they've said what they're doing instead, have they?

1 comments

simonw 35 days ago

I believe it's an evolution of the technique used in GPT-Image-1 (or whatever they called that), which was derived from their work on making GPT-4o an "omni" model that can directly output images and audio in addition to text.

The 2024 GPT-4o launch post https://openai.com/index/hello-gpt-4o/ hints about how that works:

"With GPT‑4o, we trained a single new model end-to-end across text, vision, and audio, meaning that all inputs and outputs are processed by the same neural network."

link

embedding-shape 35 days ago

Yeah, that's my belief as well, but haven't seen any concrete explanations about how it works, just the marketing/press releases sadly.

link