|
|
|
|
|
by erwannmillon
1048 days ago
|
|
Technically yes, the encoder and unet are convolutional and support arbitrary input sizes, but the model was trained at 64x64px bc of compute limitations. You could probably resume the training from a 64x64 resolution checkpoint and train at a higher resolution. But like most diffusion models, they don't generalize very well to resolutions outside of their training dataset |
|