Hacker News new | ask | show | jobs
by hyperman1 80 days ago
I wonder if it is possible to double all token types . One token is secure, the other is not. The user input is always tokenized to insecure variants. You kinda get a secret language for prompts. Of course, new token kinds are not cheap, and how do you train this thing?
1 comments

You don't even need to double the tokens. Tokens are mapped to vectors right at the input of the LLM, so one of the numbers in that vector could be reserved to represent something like "authority". This way information about the source of each individual token can be injected right at the input.

System prompt tokens would get the maximum authority value, and random downloaded data would get the minimum authority value. Tokens from the user prompt could be somewhere in between.

Then train the model with examples that show that system prompts should be respected, and prompt injection attacks should be ignored.