|
|
|
|
|
by MacsHeadroom
830 days ago
|
|
This is exactly right. Model collapse does not exist in practice. In fact, LLMs trained on newer web scrapes have increased capabilities thanks to the generated output in their training data. For example, "base" pretrained models trained on scrapes which include generated outputs can 0-shot instruction follow and score higher on reasoning benchmarks. Intentionally produced synthetic training data takes this a step further. For SoTA LLMs the majority of, or all of, their training data is generated. Phi-2 and Claude 3 for example. |
|
Granted, one could argue that this only happened because the API version of Claude doesn't appear to use a system prompt. If that's the case, then the LLM lacks any identity otherwise defined by the initial system prompt, and thus, kind of makes one up.
Nonetheless, point remains, it's kind of interesting to see that in the years since the launch of ChatGPT we're already seeing a tangible impact on publicly available training data. LLMs "know" what ChatGPT is, and may even claim to be it.