|
|
|
|
|
by conartist6
159 days ago
|
|
I don't see how you can stop the LLMs ingesting any poison either, because they're filling up the internet with low-value crap as fast as they possibly can. All that junk is poisonous to training new models. The wellspring of value once provided by sites like StackoverFlow is now all but dried up. AI culture is devaluing at an incredible rate as it churns out copied and copies and copies and more copies of the same worthless junk. |
|
It goes further than that—they do lots of testing on the dataset to find the incremental data that produces best improvements on model performance, and even train proxy models that predict whether data will improve performance or not.
“Data Quality” is usually a huge division with a big budget.