|
|
|
|
|
by gmartinsribeiro
1101 days ago
|
|
This is not a model problem or synthetic data problem.
This is common data science and the article says that: "We find that use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear."
Data quality is more important than data volume and if you forget about that... garbage in, garbage out. Make sure you have a representative training dataset, real or synthetic, it doesn't matter. |
|