Hacker News new | ask | show | jobs
by bjourne 1047 days ago
Isn't this product kind of impossible? Like a compression program that compresses compressed files? If you have an algorithm for determining whether a generated image is good or bad couldn't the same logic be incorporated into the network so that it doesn't generate bad images?
3 comments

Not impossible at all - classifier networks are much, much easier to train than generative networks. However you can’t directly integrate the logic into the generator, you’d have to train the generator against the discriminator network. This is essentially the principle of a GAN and although many tricks have been developed in recent years, they tend to be finicky and difficult to train.

Diffusion models like SD are trained with a very simple loss function instead, which is just the L2 loss of an iterative denoising process. This tends to result in stabler training than using GANs. However, you could fine tune SD with reinforcement learning using the deformity detector as the reward, but it’s not a panacea as it could lead to overfitting and performance degradation.

> Not impossible at all - classifier networks are much, much easier to train than generative networks. However you can’t directly integrate the logic into the generator, you’d have to train the generator against the discriminator network.

Generative networks are ime not at all difficult to train because the amount of training data is typically orders of magnitudes larger. In this case, the idea is to train something to classify images as high or low quality, which I think is just as hard as generating images. Regardless, if you had such logic, I don't see why you couldn't incorporate that into the network's own loss function? That's how it is done for L1 and L2 regularization and many other techniques for "tempering" the training process.

The problem is that you want the model to be creative but not "too creative" (e.g eight finger hands). But preventing it from being too creative risks making it boring and bland (e.g only generating stock images). I don't think you can solve that with a post-processing filter. Generating say 100 images and picking the "best" one might just be the same as picking the most bland one.

That's essentially how using a GAN works.

E: or how it's supposed to work.

Kind of phenomological but both parts of the GAN are the same model.
We’re optimistic about using our own algorithms and models to evaluate another model. In theoretical computer science, it is easier to verify a correct solution than to generate a correct solution (P vs NP problem).
I don't think p vs np has anything to do with it, but also, I don't think your maxim is always (but maybe often is) true anyways.

Problem: traveling salesman, solution: one particular path. I think verification that the solution is optimal in this case is exactly the same problem as finding the solution.

Not to nitpick but this is NOT the right takeaway from P vs NP.
Do you, or will you, use human labor in any instance on evaluating images?
We currently don't since its not scalable