|
|
|
|
|
by hyperman1
80 days ago
|
|
I wonder if it is possible to double all token types
. One token is secure, the other is not. The user input is always tokenized to insecure variants. You kinda get a secret language for prompts. Of course, new token kinds are not cheap, and how do you train this thing? |
|
System prompt tokens would get the maximum authority value, and random downloaded data would get the minimum authority value. Tokens from the user prompt could be somewhere in between.
Then train the model with examples that show that system prompts should be respected, and prompt injection attacks should be ignored.