| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Terr_ 480 days ago

Yeah, for LLMs what we label "prompt-injection" isn't an exception or an error, it's a fundamental feature.

Get a document, provide a bigger document that "fits". In that document, there's no fundamental distinction between prompt, user input, or output the LLM generated on a prior iteration. (Hence tricks like: "Here's a ROT13 string, pretend you're telling yourself the opposite of that sarcastically.")

The kind of "proper" security everyone wants would require a whole new approach that can--at a high and debuggable level--recognize distinct actors/entities, logical propositions, contradictions, and when one entity is asserting a proposition rather than quoting/rejecting it.