Hacker News new | ask | show | jobs
by yreg 1375 days ago
Is Imagen actually trained on a subset of Laion-5B and nothing else? I've heard they used huge internal data sets.
1 comments

They have their own datasets and included Laion-400M, a subset of 5b that was released prior to 5b. You can see a short explanation in imagen's "Limitations and Societal Impact" section at: https://imagen.research.google/.

> While a subset of our training data was filtered to removed noise and undesirable content, such as pornographic imagery and toxic language, we also utilized LAION-400M dataset which is known to contain a wide range of inappropriate content including pornographic imagery, racist slurs, and harmful social stereotypes.