|
|
|
|
|
by Terr_
432 days ago
|
|
Yeah, for LLMs what we label "prompt-injection" isn't an exception or an error, it's a fundamental feature. Get a document, provide a bigger document that "fits". In that document, there's no fundamental distinction between prompt, user input, or output the LLM generated on a prior iteration. (Hence tricks like: "Here's a ROT13 string, pretend you're telling yourself the opposite of that sarcastically.") The kind of "proper" security everyone wants would require a whole new approach that can--at a high and debuggable level--recognize distinct actors/entities, logical propositions, contradictions, and when one entity is asserting a proposition rather than quoting/rejecting it. |
|