Hacker News new | ask | show | jobs
by cma 1280 days ago
> and they don't need to be "high quality", a man will almost always have 2 arms, 2 legs etc...

At the next generation it feels like the training set will be inbreeding on the flood of stable diffusion images with 7 mangled fingers, heads coming out of legs, etc.

1 comments

LAION-400M/5B will obviously not change (and there is enough data to train a really good model), if a future dataset has AI-generated images, these will be highly curated as the images were chosen by the person who was using the model and probably further shared by other users, it would work like a complicated Reinforcement Learning from Human Feedback (RLHF), plus AI-generated images will usually have keywords such as "AI," "Midjourney" in the caption so that the model can learn to distinguish them from the rest of the dataset (and CFG comes to the rescue when the dataset is noisy).