|
|
|
|
|
by convexfunction
1234 days ago
|
|
Yeah, it's bullshit, but digging into a specific point from their FAQ: > Usually, the image the model creates doesn’t exist in its training data - it’s new - but because of the training process, the most influential images are the most visually similar ones, especially in the details. Would be cool if this were true, but I don't think it is, because the prompt you used and the captions on the training images are being completely ignored. If two different words tend to be used in captions for very visually similar images, and you use just one of those words in your inference prompt, I'm pretty sure the images that were captioned with the word you used are much more "influential" on your output than the images that were captioned with the word you didn't use. (Like, "equestrian" vs "mountie" or "cowboy" or something.) |
|
1. Take the prompt you used, and use it with a model checkpoint that was trained identically to whatever model you're using, except that the top 21 images this website shows you are removed. In most cases, while your outputs won't be identical (I assume), you can probably get something pretty similar.
2. Now, take that same prompt, and use it with a model checkpoint that was only trained on the top 21 images this website shows you. (AFAIK you can't really do this because Stability hasn't released a "completely untrained" version of any of their models... though maybe they have and nobody cares because it's useless for most purposes.) I'm not completely sure what you'd get, but my bet would be that you get either nonsense or a memorized replica of one of the training images, not the same output image you got previously.