|
|
|
|
|
by mlsu
204 days ago
|
|
We are just very sharp when it comes to seeing small differences in images. I'm reminded of when the air force decided to create a pilot seat that worked for everyone. They took the average body dimensions of all their recruits and designed a seat to fit the average. It turned out, the seat fit none of their recruits. [1] I think AI image generation is a lot like this. When you train on all images, you get to this weird sort of average space. AI images look like that, and we recognize it immediately. You can prompt or fine tune image models to get away from this, though -- the features are there it's a matter of getting them out. Lots of people trying stuff like this: https://www.reddit.com/r/StableDiffusion/comments/1euqwhr/re..., the results are nearly impossible to distinguish from real images. [1] https://www.thestar.com/news/insight/when-u-s-air-force-disc... |
|
But it is undeniable that AI images do have an “average” feel to them. What causes this? What is the space over which AI is taking an average to produce its output? One possible answer is that a finite model size means that the model can only explore image space with a limited resolution, and as models get bigger/better they can average over a smaller and smaller portion of this space, but it is always limited.
But that raises the question of why models don't just naturally land on a point in image space. Is this just a limitation of training, which punishes big failures more strongly than it rewards perfection? Or is there something else at play here that's preventing models from landing directly on a “real” image?