| HN Mirror

They have their own datasets and included Laion-400M, a subset of 5b that was released prior to 5b. You can see a short explanation in imagen's "Limitations and Societal Impact" section at: https://imagen.research.google/.

> While a subset of our training data was filtered to removed noise and undesirable content, such as pornographic imagery and toxic language, we also utilized LAION-400M dataset which is known to contain a wide range of inappropriate content including pornographic imagery, racist slurs, and harmful social stereotypes.