| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Aedelon 167 days ago
	Survey of 65+ papers on model collapse. Key finding from Dohmatob et al. (ICLR 2025): even 0.1% synthetic contamination in training data causes measurable degradation. No major dataset (FineWeb, RedPajama, C4) currently filters for AI-generated content.