|
I seriously doubt that data set poisoning will be a real limiter in model performance. For one, if your website/book is poisoned, who is going to trust it for anything at all, much less for training models? For two, all the major AI labs hire or contract for subject matter experts to create curated data sets, evaluate model performance, etc. Unless they hire malicious experts, this will provide a growing, high quality data set that should drown out any poisoned pretraining data. |
If it's easy enough that some randos can do it for fun, what do you think happens when there's commercial interest behind it?
Obviously companies are going try nudging AI towards recommending whatever they're selling. It's a logical extension of SEO - and that's a 100 billion USD industry.
Additionally, if I believed myself to be in some sort of spending - err - AI race, I'd try to poison the data sets of my competitors by putting crap out there for others to ingest.