| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dTal 565 days ago

The very worst that would happen is that you make someone's training run slightly less efficient. If your data is truly random garbage, the model won't be able to make any predictions about it and thus it will not distort performance. All training data is noisy to an extent, and you've just fed it pure noise.

However, it has become clear that effective LLM training is in large matter a matter of careful curation of high quality training data. Random gibberish is trivially detectable, by LLMs themselves if nothing else, so it's unlikely that your "honeypot" will ever make it into someone's training run.

Even if you carefully crafted some more subtle poison data, it would still form only a small amount of the training set. The worst case scenario is most likely that the LLM learns to recognize your particular style of poison, and will happily recreate it if prompted appropriately (while otherwise remaining unaffected); more likely, your poison data is simply swamped.