Hacker News new | ask | show | jobs
by the_origami_fox 1253 days ago
Author here. I have noticed similar behaviour. As part of this exercise I tried to train a model to generate Pokemon based on This Pokemon Does Not Exist by HuggingFace. However my models only converged to nosiy smudges after 50 iterations and so I excluded it from the posts (I do mention my experiments at the end of part 2).

My first assumption was that the mdoel I was training was too small: 13 million parameters as opposed to the 1.3 billion in ruDALL-E (not sure how much of this is only the diffusion model). So that's a 100x smaller. I want to experiment with upscaling it.

Reading this I'm wondering if there's more I need to do. For example, training a conditioned model - "cheat" by given it in the index of the Pokemon during training but then you sample without an index - or make the model predict the standard deviation (beta tilde). Or as you say, work with loss functions.

More work to be done here.