| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by famouswaffles 441 days ago

This video is very good. https://youtu.be/EzDsrEvdgNQ?si=EWp3U1GMkwg1bMQQ

One thing i'd add is that generating the tokens at the target resolution from the start is no longer the only approach to autoregressive image generation.

Rather than predicting each patch at the target resolution right away, it starts with the image (as patches) at a very small resolution and increasingly scales up. Paper here - https://arxiv.org/abs/2404.02905