Hacker News new | ask | show | jobs
by erwannmillon 1048 days ago
Technically yes, the encoder and unet are convolutional and support arbitrary input sizes, but the model was trained at 64x64px bc of compute limitations. You could probably resume the training from a 64x64 resolution checkpoint and train at a higher resolution.

But like most diffusion models, they don't generalize very well to resolutions outside of their training dataset