Hacker News new | ask | show | jobs
by wcoenen 82 days ago
You don't even need to double the tokens. Tokens are mapped to vectors right at the input of the LLM, so one of the numbers in that vector could be reserved to represent something like "authority". This way information about the source of each individual token can be injected right at the input.

System prompt tokens would get the maximum authority value, and random downloaded data would get the minimum authority value. Tokens from the user prompt could be somewhere in between.

Then train the model with examples that show that system prompts should be respected, and prompt injection attacks should be ignored.