|
|
|
|
|
by radarsat1
1253 days ago
|
|
Data is mel spectrograms. To be clear about the conditional labels, I was trying to get it to come up with a vector quantized code, so it's not conditionally labeled but rather I was using an embedding layer with a VQ layer to have it come up with its own codebook. This works well with VQGAN so I was surprised that for diffusion it just keeps setting all the codes to the same value and ignoring them, but maybe I'm doing something wrong. Still working on it. I'm just expressing here that my expectation was that this method would be less finicky than GAN because it uses an MSE loss, but unfortunately it seems to have its own difficulties. No silver bullet, I guess. The integration sampling can be quite sensitive to imperfections and diverge easily, at least in early stages of training. I decided to write this because it feels like the early days of GAN where overall there seems to be lots of these "explain diffusion from scratch" type articles out there, but not yet a lot discussing common pitfalls and how to deal with them. |
|
Particularly relevant to the noisy training you mentioned earlier is their alternative timestep sampling procedure they propose which seem to reduce gradient noise significantly judging from their experiments. Would love to hear or discuss if you have found any other design changes which have improved training / sample qualities :)