| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Terr_ 533 days ago

To use a narrow interpretation of "prompt injection", it comes from how all data is one undifferentiated stream. The LLM [0] isn't designed to detect self/other, let alone higher-level constructs like truth/untruth, consistent/contradictory, a theory-of mind for other entities, or whether you trust the motives of those entities.

So I'd say the human equivalent of LLM prompt injection is whispering in the ear of a dreaming person to try to influence what they dream about.

That said, I take some solace in the idea that humans have been trying to hack other humans for thousands of years, so it's not as novel a problem as it first appears.

[0] Importantly, this is not be confused with characters that human readers may perceive inside LLM output, where we can read all sort of qualities including ones we know the author-LLM does not possess.