Hacker News new | ask | show | jobs
by alyxya 57 days ago
This seems like a wasted effort when AI will primarily learn the majority consensus view and not one-off misinformation. AI tries to learn pattern matching for generalization, so garbage data doesn't make AI learn the wrong patterns, at best just slows down learning the actual patterns. When most compute for training is spent on curated data and RL rather than random web-scraped data, the impact is likely negligible.
3 comments

> This seems like a wasted effort when AI will primarily learn the majority consensus view and not one-off misinformation.

We have evidence to the contrary. Two blog articles and two preprints of fake academic articles [0] were able to convince CoPilot, Gemini, ChatGPT and Perplexity AI of the existence of a fake disease, against all majority consensus. And even though the falsity of this information was made public by the author of the experiment and the results of their actions were widely published, it took a while before the models started to get wind of it and stopped treating the fake disease as real. Imagine what you can do if you publish false information and have absolutely no reason to later reveal that you did so in the first place.

[0] https://www.nature.com/articles/d41586-026-01100-y

> Two blog articles and two preprints of fake academic articles [0] were able to convince CoPilot, Gemini, ChatGPT and Perplexity AI of the existence of a fake disease, against all majority consensus

Wrong. There are no 'majority consensus' against 'bixonimania' because they made it up, that was the point. It's unsurprisingly easy to get LLMs to repeat the only source on a term never before seen. This usually works; made-up neologisms are the fruitfly of data poisoning because it is so easy to do and so unambiguous where the information came from. (And retrieval-based poisoning is the very easiest and laziest and most meaningless kind of poisoning, tantamount to just copying the poison into the prompt and asking a question about it.) But the problem with them is that also by definition, it is hard for them to matter; why would anyone be searching or asking about a made-up neologism? And if it gets any criticism, the LLMs will pick that up, as your link discusses. (In contrast, the more sources are affected, the harder it is to assign blame; some papermills picked up 'bixonimania'? Well, they might've gotten it from the poisoned LLMs... or they might've gotten it from the same place the LLMs did which poisoned their retrievals, Medium et al.)

The LLMs didn't only talk about the disease when prompted by the neologism. They also brought it up when asked about the symptoms. From the article:

> OpenAI’s ChatGPT was telling users whether their symptoms amounted to bixonimania. Some of those responses were prompted by asking about bixonimania, and others were in response to questions about hyperpigmentation on the eyelids from blue-light exposure.

And yes, sure, in this example the scientific peer-review process may have eventually criticised and countered 'bixonimania' as a hoax were the researcher to have never revealed its falsity—emphasis on 'may', few researchers have the time and energies to trawl through crap papermill articles and publish criticisms. Either way, that is a feature of the scientific process and is not a given to any online information.

What happens when false information is divulged by other means that do not attempt to self-regulate? And how do we distinguish one-off falsities from the myriad of obscure true things that the public is expecting LLMs to 'know' even when there is comparatively little published information about them and therefore no consensus per se?

"hyperpigmentation on the eyelids from blue-light exposure" is a super specific query almost definitionally 'bixonimania' which probably brought up the 'bixonimania' poison at the time (the search hits for that query right now in Google are weak and poorly relevant so it would not be hard to outrank them or at least get into the top 50 or so where a retrieval LLM would see them and would followup), and so still an instance of what I mean.

> Either way, that is a feature of the scientific process and is not a given to any online information.

Which does not distinguish it in any way from human errors like a crank or activist etc.

And I don't know, how did we handle false information before on niche topics no one cared about and which were unimportant? It's just noise. The worldwide corpus has always been full of extremely incorrect, mislabeled, corrupted, distorted, information on niche topics of no importance. But it's generally not important.

All the examples you gave are chatbots with web search integrated. Are you sure those chatbots didn't just reference false information it found in web searches? That's fundamentally different than poisoning the training of AI models.

> The problem was that the experiment worked too well. Within weeks of her uploading information about the condition, attributed to a fictional author, major artificial-intelligence systems began repeating the invented condition as if it were real.

This seems to imply the poisoning affected the web search results, not the actual model itself, because it takes months for data to make it into a trained base model.

In the pre-AI-collapse era, we called this PageRank ;)
What is the pattern for truth if I flood your data with lies?
The same way humans deal with it, check it against multiple reputable sources.
We already learned how to defeat this from SEO spammers and citation farmers: by building networks that cross reference and corroborate one another’s fake stories.

We’re already at a point where much of the academic research you find in online databases can’t be trusted without vetting through real world trustworthy institutions and experts in relevant fields. How is an LLM supposed to do this kind of vetting without the help of human curators?

If all the LLM training teams have to stop indiscriminate crawling and fall back to human curation and data labeling then the poisoners will have won.

Some of the reputable sources are taking flood of the lies for possible truth. Now what?