|
|
|
|
|
by famouswaffles
441 days ago
|
|
This video is very good.
https://youtu.be/EzDsrEvdgNQ?si=EWp3U1GMkwg1bMQQ One thing i'd add is that generating the tokens at the target resolution from the start is no longer the only approach to autoregressive image generation. Rather than predicting each patch at the target resolution right away, it starts with the image (as patches) at a very small resolution and increasingly scales up.
Paper here - https://arxiv.org/abs/2404.02905 |
|