|
|
|
|
|
by brianr
1156 days ago
|
|
This analysis misses the impact of AI models being deployed, like is happening rapidly right now. Production applications built on AI will provide ample (infinite?) additional training data to feed back into the underlying models. |
|
It seems "good enough" (for now) but synthetic makes up a very small proportion of the training set being used in current models that have been trained on it, if that proportion ends up being mostly synthetic we'll likely see whatever weird hallucinations and biases in the dominant backend (GPT4 or whatever) become amplified.
It's been shown repeatedly that garbage in = garbage out for training data.