|
|
|
|
|
by astrange
969 days ago
|
|
AI image models are almost entirely not trained on human labeled data; StableDiffusion is trained on scraping nearby text on the page, DALLE3 uses synthetic captions from an image-to-text model, Midjourney doesn't disclose what they do. You can't get humans to label a billion images. One way you can tell this isn't true is that if you take an image model and prompt it with an image, or just surf through the latent space by changing the embeddings, you'll find absolutely everything in there, from non-stereotypical representations to undescribable things. |
|
And that nearby text was written by humans, so it may not be explicitly labelled in HTML attributes but if the context wasn't related the scraping wouldn't work.