|
What you are talking about is Overfitting. It only happens with images that appear way too many times in too many forms in the training set. Usually with the most iconic images of all time, such as the Mona Lisa. And, naturally, hyper-iconic images are the first thing that come to mind for humans when they test for the issue because those images are seared into our brains too. And, much like with our brains, when it happens it doesn’t actually exactly reproduce parts of the source image. But, you have actively pay attention to notice what happened. It makes an image that is overly similar conceptually. To our brains that feels the same. So, that’s enough to convince someone at a glance that it is the same. But, if you look at an overfit result of “The Beatles Abbey Road album cover”, you’ll see things like: Band members are crossing the road, but they are all variations of Ringo. Vehicles from that era are in the background, but they are in a different arrangement and none of them are directly from the source. The Band members are wearing suits, but they are the wrong style and color. There are the wrong number of stripes on the road. It’s not the same as a highly skilled human drawing an iconic image from memory. But, it sure is darn similar. And, besides all that, everyone working in the tech considers the overfitting of iconic images to be a failure case that is being actively addressed. It won’t be long before it stops happening entirely. In the meantime, I’d challenge anyone to try to make an overfit result that significantly reproduces a specific work of every promoter’s favorite, Greg Rutkowski, using Dall-e, Midjourney or the Stable Diffusion models released directly by Stability AI. Greg’s pixels aren’t in the model file to be copied. Only a conceptual impression of his style. |
Not really, though that is another legitimate issue.
I was talking about 1) the fundamental training and inference process, which remembers pixels, not concepts or techniques. Today’s AI learns to create imagery in a fundamentally different way than people do. And 2) image generation AI based on text prompts like Stable Diffusion can easily be asked to reproduce training data by having a prompt that is narrow and specific enough. This is not over fitting, it’s a function of the fact that some inputs are quite unique, and you can use the prompt to focus on that uniqueness.