Hacker News new | ask | show | jobs
by famouswaffles 442 days ago
Open AI have both said it's native image generation and autoregressive. It has the signs of it too.

It's probably an implementation of VAR (https://arxiv.org/abs/2404.02905) - autoregressive image generation with a small twist. Rather than predict every token at the target resolution directly, start with predicting it at a small resolution, cranking it higher and higher until the desired resolution.