|
|
|
|
|
by crazygringo
950 days ago
|
|
It's not about debuggability, prompt injection is an inherent risk in current LLM architectures. It's like a coding language where strings don't have quotes, and it's up to the compiler to guess whether something is code or data. We have to hope there's going to be an architectural breakthrough in the next couple/few years that creates a way to separate out instructions (prompts) and "data", i.e. the main conversation. E.g. input that relies on two sets of tokens (prompt tokens and data tokens) that can never be mixed or confused with each other. Obviously we don't know how to do this yet and it will require a major architectural advance to be able to train and operate at two levels like that, but we have to hope that somebody figures it out. There's no fundamental reason to think it's impossible. It doesn't fit into the current paradigm of a single sequence of tokens, but that's why paradigms evolve. |
|
One issue is it's very hard to draw the distinction between instructions and data. Are a neural net’s weights instructions? (They're definitely data.) They are not literally executed by the CPU, but in a NN of sufficient complexity (say, in a self driving car, which both perceives and acts), they do control the NN’s actions. An analogous and far more thorny question would be whether our brain state is instruction or data. At any moment in time our brain state (the locations of neurons, nutrients, molecules, whatever) is entirely data, yet that data is realized, through the laws of physics/chemistry, as instructions that guide our bodies’ operation. Those laws are too granular to be instructions per se (they're equivalent to wiring in a CPU). So the data is the instruction.
I think LLMs are in a similar situation. The data in their weights, when it passes through some matrix multiplications, is instructions on what to emit. And there's the rub. The only way to have an LLM where data and instruction never meet, in my view, is one that doesn't update in response to prompts (and therefore can't carry on a multi prompt conversation). As long as your prompt can make even somewhat persistent changes to the model’s state — its data — it can also change the instructions.