| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by uh_uh 455 days ago
	This top to bottom drawing – does this tell us anything about the underlying model architecture? AFAIK diffusion models do not work like that. They denoise the full frame over many steps. In the past there used to be attempts to slowly synthetize a picture by predicting the next pixel, but I wasn't aware whether there has been a shift to that kind of architecture within OpenAI.

2 comments

cubefox 454 days ago

Yes, the model card explicitly says it's autoregressive, not diffusion. And it's not a separate model, it's a native ability of GPT-4o, which is a multimodal model. They just didn't made this ability public until now. I assume they worked on the fine-tuning to improve prompt following.

link

thesparks 454 days ago

apparently it's not diffusion, but tokens

link