| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bandrami 3 hours ago
	Maybe I'm missing something but does this idea need a "theory"? There's zero sideband here; everything is just context. "Injection" is just kind of baked in to the design.

4 comments

geoffschmidt 2 hours ago

I think their work earns "theory" because it makes specific predictions both about how to make more effective prompt injection attacks and what activations you'd observe in the LLM during those attacks, and can also be plausibly extrapolated to suggest useful future research directions.

link

zby 37 minutes ago

They do predict what injections might be effective - so it is a theory. I don't know how novel it is and it is not very deep (as you noted the general mechanism is quite obvious) - but they do it quite systematically so it is useful.

link

yunwal 3 hours ago

At this point I think it's similar to reporting a particularly effective social engineering practice. It's not particularly surprising that it works or that it exists, but it's still noteworthy.

link

joe_the_user 2 hours ago

Well, the original HN title (which has been changed as I write) was the second large text "A Theory of Prompt Injection", which should simply be "A Method Of Prompt Injection Using Roles".

I would say this method is less interesting than the question of whether one needs a discreet theory of why "prompt injections" ("malicious" frame jumps) exist or whether one should assume changing logical frame jumps are present by default in all normal human language (LLM training sets) and all the system prompts and filtering done against so called "prompt injection" are what is going be ad-hoc and without a unified theory.

link

jackb4040 1 hour ago

I was gonna say, anyone who's copy-pasted one LLM conversation into another already intuitively understands all this.

link