|
|
|
|
|
by erwannmillon
1057 days ago
|
|
Think inference time was on the order of 4-5seconds per image on a v100, which you can rent for like .80 cents an hour, though you can get way better gpus like a100s for ~1.1 usd/h now. But ofc this is at 64px res in pixel space. If you wanted to do this at high res, you would definitely use a latent diffusion model. The autoencoder is almost free to run, and reduces the dimensionality of high res images significantly, which makes it a lot cheaper to run the autoregressive diffusion model for multiple steps. |
|