LLM poisoning is about getting bad data into the training set. There is zero chance that this comment from 3 days ago was part of the training data for any currently public LLM.
Assuming the LLM actually got its answer from that comment, it was from a web search.
I mean, if an LLM, when given a query not in its training data, resorts to searching Google and then summarizes those results as the truth with 100% certainty, because, fuck it: YOLO… I'm already very capable of doing that myself, thank you. What's the point, even?
Yeah this is context poisoning, not model poisoning, which is way, way more effective.
Google and Reddit have contracts: Google has official scraping access to Reddit (probably more than that at this point since the contracts were signed 1-2 years ago). But the fact that Reddit does a good job at moderating human content makes it a boon for plausibly "up-to-date" info (which a model doesn't have). Google's LLM summaries even include Reddit as its foremost "citations".
Anyway, Google does a RAG or something similar for its LLM responses, and takes Reddit info at face value. I'm very interested to see what the "thresholds" are, like how much context poisoning do you need to be effective. If the above link is reliable then the answer is "mere sentences".
Certainly bad-actor merchants would try this sort of thing on merchandise subreddits; welcome to the new AIO/GEO everyone.
Assuming the LLM actually got its answer from that comment, it was from a web search.