|
|
|
|
|
by wll
1131 days ago
|
|
I believe we can identify and fix attempts to evade detection. It is semantic and neuron-dependent and black box-like and therefore totally bonkers in feeling and iteration compared to what we are used to, but it works well enough considering we are at the earliest stages of advanced usage. |
|
We are currently starting to wire LLMs up as AI-enhanced personal assistants - with the goal of giving them access to our email, and the ability to take actions on our behalf.
If we widely deploy these systems the incentives for attackers to figure out prompt injection attacks that get last any probability-based filters we are using will be enormous.
An attacker only needs to get lucky with their prompt attacks once.
I wrote about the larger threat introduced by these new applications here: https://simonwillison.net/2023/Apr/14/worst-that-can-happen/