|
|
|
|
|
by groby_b
719 days ago
|
|
You're confusing it with data poisoning. Model collapse itself is(was?) a fairly serious research topic: https://arxiv.org/abs/2305.17493 We've by now reached a "probably not inevitable" - https://arxiv.org/abs/2404.01413 argues there's a finite upper bound to error - but I'd also point out that that paper assumes training data cardinality increases with the number of training generations and is strictly accumulative. To a first order, that means you better have a pre-2022 dataset to get started, and have archived it well. but it's probably fair to say current SOTA is still more or less "it's neither impossible nor inevitable". |
|
> To a first order, that means you better have a pre-2022 dataset to get started, and have archived it well.
I think that will always be available, or at least, a dataset with the distribution you want will be available.