|
|
|
|
|
by nomel
805 days ago
|
|
Sure, but that's the issue. You have to treat all input as hostile, yet there's no trivial way to sanitize or contain it like is possible with some user provided string for an sql statement. Since a hard/deterministic concept of encapsulation of user input can't really exist with next token prediction, you have to rely on some sort of fine tuning to try to get it to understand the concepts, with that understanding usually being vulnerable to silly reverse psychology. My question for you is, what is the correct way to use an LLM? How can you accept non trivial user input without the risk of jailbreak? |
|
So I'm kind of speaking from the spectator peanut-gallery here, as I'm something of an LLM-skeptic, but one scenario I can imagine is where the model helps the user format their own not-so-structured information, where there aren't any (important) secrets anywhere and the input is already user-level/untrusted.
Consider the failure of simple code behind this interaction:
1. "Hi, what's your first name?"
2. "Greetings, my name is Bob."
3. "Okay, Greetings, my name is Bob., next enter your last name."
In contrast, an LLM might a viable way to take the first two lines plus "Tell me just the user's first name", then a more-deterministic system can be responsible for getting final confirmation that "Bob" is correct before it goes into any important records.
A more-ambitious exchange might be:
1. "Hi, what is your legal name?"
2. "My name is Bobby-Joe Von Micklestein. Junior, if it matters."
3. "So your given name is Bobby-Joe and your middle name is Von and your last name is Micklestein, is that correct?"
4. "No, the last name is Von Micklestein, two words."
If the user really wants to get the prompt, it probably won't be anything surprising, and it doesn't create any greater risks than before when it comes to a hostile user trying to elicit bad output [0], assuming programmers don't get lazy and wrongly-trust the new LLM to sanitize things.