|
|
|
|
|
by joefourier
1044 days ago
|
|
Not impossible at all - classifier networks are much, much easier to train than generative networks. However you can’t directly integrate the logic into the generator, you’d have to train the generator against the discriminator network. This is essentially the principle of a GAN and although many tricks have been developed in recent years, they tend to be finicky and difficult to train. Diffusion models like SD are trained with a very simple loss function instead, which is just the L2 loss of an iterative denoising process. This tends to result in stabler training than using GANs. However, you could fine tune SD with reinforcement learning using the deformity detector as the reward, but it’s not a panacea as it could lead to overfitting and performance degradation. |
|
Generative networks are ime not at all difficult to train because the amount of training data is typically orders of magnitudes larger. In this case, the idea is to train something to classify images as high or low quality, which I think is just as hard as generating images. Regardless, if you had such logic, I don't see why you couldn't incorporate that into the network's own loss function? That's how it is done for L1 and L2 regularization and many other techniques for "tempering" the training process.
The problem is that you want the model to be creative but not "too creative" (e.g eight finger hands). But preventing it from being too creative risks making it boring and bland (e.g only generating stock images). I don't think you can solve that with a post-processing filter. Generating say 100 images and picking the "best" one might just be the same as picking the most bland one.