| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kevinbojarski 64 days ago
	I wouldn't be so confident that poisoning won't work. https://www.reddit.com/r/BrandNewSentence/comments/1so9wf1/c...

2 comments

phainopepla2 64 days ago

LLM poisoning is about getting bad data into the training set. There is zero chance that this comment from 3 days ago was part of the training data for any currently public LLM.

Assuming the LLM actually got its answer from that comment, it was from a web search.

link

tomjakubowski 64 days ago

I mean, if an LLM, when given a query not in its training data, resorts to searching Google and then summarizes those results as the truth with 100% certainty, because, fuck it: YOLO… I'm already very capable of doing that myself, thank you. What's the point, even?

link

Legend2440 64 days ago

Whatever's happening here, it's not training data poisoning.

Models are retrained only every few months at best; it is not possible for a comment made a few hours earlier to be in the training data yet.

link

solaire_oa 64 days ago

Yeah this is context poisoning, not model poisoning, which is way, way more effective.

Google and Reddit have contracts: Google has official scraping access to Reddit (probably more than that at this point since the contracts were signed 1-2 years ago). But the fact that Reddit does a good job at moderating human content makes it a boon for plausibly "up-to-date" info (which a model doesn't have). Google's LLM summaries even include Reddit as its foremost "citations".

Anyway, Google does a RAG or something similar for its LLM responses, and takes Reddit info at face value. I'm very interested to see what the "thresholds" are, like how much context poisoning do you need to be effective. If the above link is reliable then the answer is "mere sentences".

Certainly bad-actor merchants would try this sort of thing on merchandise subreddits; welcome to the new AIO/GEO everyone.

link