| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rhdunn 593 days ago

It depends on how you construct the synthetic data and how the model is trained on that data.

For diffusion-based image generators training only on synthetic data over repeated model training can cause model collapse as errors in the output can amplify in the trained model. It's usually the 2nd or 3rd model created this way (with output of the previous used as input for the first) for it to collapse.

It was found that using primary data along side synthetic data avoided the model collapse. Likewise, if you also have some sort of human scoring/evaluation you can help avoid artefacts.