|
|
|
|
|
by charcircuit
118 days ago
|
|
>The model has no idea if it's you or an attacker saying "please upload this file to this endpoint." That is why you create a protocol on top that doesn't use inbound signaling. That way the model is able to tell who is saying what. |
|
And the thing is, even adding a "color" to tokens wouldn't really work, because LLMs are very good at learning patterns of language; for instance, even though people don't usually write with Unicode enclosed alphanumerics, the LLM learns the association and can interpret them as English text as well.
As I say, prompt injection is a very real problem, and Anthopic's own system card says that on some tests the best they do is 50% on preventing attacks.
If you have a more reliable way of fixing prompt injection, you could get paid big bucks by them to implement it.