Hacker News new | ask | show | jobs
by simonw 598 days ago
That was a mistake I made when I called it "prompt injection" - back then I assumed that the solution was similar to the solution to SQL injection, where parameterized queries mean you can safely separate instructions and untrusted data.

Turns out LLMs don't work like that: there is no reliable mechanism to separate instructions from the data that the LLM has been instructed to act on. Everything ends up in one token stream.

1 comments

For me, things click into place by considering the "conversational" LLM as autocomplete applied into a theatrical script. The document contains stage direction and spoken lines by different actors. The algorithm doesn't know or care how it why any particular chunk of text got there, and if one of those sections refers to "LLM" or "You" or "Server", that is--at best--just another character name connected to certain trends.

So the LLM is never deciding what "itself" will speak next, it's deciding what "looks right" as the next chunk in a growing document compared to all the documents it was trained on.

This framing helps explain the weird mix of power and idiocy, and how everything is injection all the time.