Hacker News new | ask | show | jobs
by famouswaffles 441 days ago
This video is very good. https://youtu.be/EzDsrEvdgNQ?si=EWp3U1GMkwg1bMQQ

One thing i'd add is that generating the tokens at the target resolution from the start is no longer the only approach to autoregressive image generation.

Rather than predicting each patch at the target resolution right away, it starts with the image (as patches) at a very small resolution and increasingly scales up. Paper here - https://arxiv.org/abs/2404.02905