Hacker News new | ask | show | jobs
by somedude895 971 days ago
> they are getting stereotypes instead of the average.

That sort of makes sense though. The training data is labeled images, and a picture of an average Indian in say an Indian newspaper or someone posting their own picture on their blog, won't be labeled "Indian", since within that context the nationality either doesn't matter or is a given. The training data would have to include the context like "if source url tld = .in" then add "India" to label. But that adds a whole host of other issues.

Someone correct me if I'm wrong.

1 comments

The image model knows what images look like even without prompts, and if you train it on a trillion images it will create a latent space where similar pictures have similar embeddings. Inaccurate captions for some of them may mean that the text encoder can't get you to those embeddings, but they're still in there.

What this means is that text prompting is a bad way to drive an image generating model.