|
|
|
|
|
by Aedelon
121 days ago
|
|
Survey of 65+ papers on model collapse. Key finding from Dohmatob et al. (ICLR 2025): even 0.1% synthetic contamination in training data causes measurable degradation. No major dataset (FineWeb, RedPajama, C4) currently filters for AI-generated content. |
|