"Model collapse" isn't real. It's a laboratory failure mode that doesn't happen in real world environments.
It's popular because some people latched onto the idea - desperately wanting something to stop the AI tech from advancing. It, quite obviously, doesn't stop the AI tech from advancing.
Now, you can write an entire research paper on why model collapse happens or fails to happen. But a simple way to think of it is: looping AI onto itself multiple times amplifies that AI's own deficiencies, distortions and idiosyncrasies - until, after enough iterations, they come to completely dominate its outputs.
This doesn't apply at all to training an LLM on Whisper outputs that are, in turn, based on human-generated videos. The LLM will inherit some Whisper quirks, but most of the data in Whisper outputs comes from the videos themselves.
Personally I don't believe in model collapse. Has anyone demonstrated it occurring in the wild, outside of the tiny set of papers that deliberately caused it to happen?
I think model collapse gets talked about so much because it is irresistible schadenfreude. The idea of models eating their own tails in a way that leads to their inevitable demise is captivating to a lot of people, especially AI skeptics.
I agree. A partial counterexample is the RL training loop on verifiable tasks, which uses the model in a loop to generate training data. Another one is the cleanup/prioritization of the pretraining data using earlier models.
More generally, a lot of ideas have been speculated based on very tiny models in controlled settings and they didnt pan out in real LLMs. There probably exists a minimal compute threshold for overcoming generalization traps.
It's popular because some people latched onto the idea - desperately wanting something to stop the AI tech from advancing. It, quite obviously, doesn't stop the AI tech from advancing.
Now, you can write an entire research paper on why model collapse happens or fails to happen. But a simple way to think of it is: looping AI onto itself multiple times amplifies that AI's own deficiencies, distortions and idiosyncrasies - until, after enough iterations, they come to completely dominate its outputs.
This doesn't apply at all to training an LLM on Whisper outputs that are, in turn, based on human-generated videos. The LLM will inherit some Whisper quirks, but most of the data in Whisper outputs comes from the videos themselves.