|
|
|
|
|
by thewopr
1202 days ago
|
|
I'd argue, they aren't doing something future-proof right now because the fundamental architecture of the LLM makes it nearly impossible to guarantee the model will correctly respond event to special [system] tokens. In your SQL example, the interpreter can deterministically distinguish between "instruct" and "data" (assuming proper escape obviously). In the LLM sense, you can only train the model to pick up on special characters. Even if [system] is a special token, the only reason the model cares about that special token is because it has been statistically trained to care, not designed to care. You can't (??) make the LLM treat a token deterministically, at least not in my understanding of the current architectures. So there may always be an avenue for attack if you consume untrusted content into the LLM context. (At least without some aggressive model architecture changes). |
|
I believe that's the case and, well, there are some problems there. Specifically, it may be an API but the magic happens with this token response, which is nondeterministic and no controllable, as commentator sillysaurusx notes.
IE, you're saying "they're doing anything like security 'cause they do anything like security". To which we'd say yeah.
But please note, LLM architecture makes it hard for this to change.