| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by simandl 1370 days ago
	They have their own datasets and included Laion-400M, a subset of 5b that was released prior to 5b. You can see a short explanation in imagen's "Limitations and Societal Impact" section at: https://imagen.research.google/. > While a subset of our training data was filtered to removed noise and undesirable content, such as pornographic imagery and toxic language, we also utilized LAION-400M dataset which is known to contain a wide range of inappropriate content including pornographic imagery, racist slurs, and harmful social stereotypes.