|
|
|
|
|
by porphyra
440 days ago
|
|
Chatgpt 4o's advanced image generation seems to have a low-resolution autoregressive part that generates tokens directly, and an image upscaling decoding step that turns the (perhaps 100 px wide) token-image into the actual 1024 px wide final result. The former step is able to almost nail things perfectly, but the latter step will always change things slightly. That's why it is so good at, say, generating large text but still struggles with fine text, and will always introduce subtle variations when you ask it to edit an existing image. |
|