|
|
|
|
|
by bobbylarrybobby
204 days ago
|
|
What determines which “average” AI models latch onto? At a pixel level, the average of every image is a grayish rectangle; that's obviously not what we mean and AI does not produce that. At a slightly higher level, the average of every image is the average of every subject every photographed or drawn (human, tree, house, plate of food, ...) in concept space; but AI still doesn't generate a human with branches or a house with spaghetti on it. At a still higher level there are things we recognize as sensible scenes, e.g., barista pouring a cup of coffee, anime scene of a guy fighting a robot, watercolor of a boat on a lake, which AI still does not (by default) average into, say, an equal parts watercolor/anime/photorealistic image of a barista fighting a robot on a boat while pouring a cup of coffee. But it is undeniable that AI images do have an “average” feel to them. What causes this? What is the space over which AI is taking an average to produce its output? One possible answer is that a finite model size means that the model can only explore image space with a limited resolution, and as models get bigger/better they can average over a smaller and smaller portion of this space, but it is always limited. But that raises the question of why models don't just naturally land on a point in image space. Is this just a limitation of training, which punishes big failures more strongly than it rewards perfection? Or is there something else at play here that's preventing models from landing directly on a “real” image? |
|
That isn't correct since images in the real world aren't uniformly distributed from [0, 255] color-wise. Take, for example, the famous ImageNet normalization magic numbers:
If it were actually uniformly distributed, the mean for each channel would be 0.5 and the standard deviation would be 0.289. Also due to z-normalization, the "image" most image models see is not how humans typically see images.