| i've been trying to keep up with this field (image generation) so here's quick notes I took: Claude's Summary: "Normalizing flows aren't dead, they just needed modern techniques" My Summary: "Transformers aren't just for text" 1. SOTA model for likelihood on ImageNet 64×64, first ever sub 3.2 (Bits Per Dimension) prev was 2.99 by a hybrid diffusion model 2. Autoregressive (transformers) approach, right now diffusion is the most popular in this space (it's much faster but a diff approach) tl;dr of autoregressive vs diffusion (there's also other approaches) Autoregression: step based, generate a little then more then more Diffusion: generate a lot of noise then try to clean it up The diffusion approach that is the baseline for sota is Flow Matching from Meta: https://arxiv.org/abs/2210.02747 -- lots of fun reading material if you throw both of these into an LLM and ask it to summarize the approaches! |
[0] https://arxiv.org/abs/2412.06264