I'll start with a quote from gwern on LessWrong: "... a language model is a Turing-complete weird machine running programs written in natural language; when you do retrieval, you are not 'plugging updated facts into your AI', you are actually downloading random new unsigned blobs of code from the Internet (many written by adversaries) and casually executing them on your LM with full privileges. This does not end well."
The instructions are manipulating the LLM itself. Making it exfiltrate and collect data , fetch new instructions from an attacker etc. All the connected applications can be fine but it's basically turning your assistant into a compromised, attacker-controlled version of itself just because it looked at the wrong news article. From our GitHub:
We demonstrate the potentially brutal consequences of giving LLMs like ChatGPT interfaces to other applications. We propose newly enabled attack vectors and techniques and provide demonstrations of each in this repository:
Remote control of chat LLMs
Persistent compromise across sessions
Spread injections to other LLMs
Compromising LLMs with tiny multi-stage payloads
Leaking/exfiltrating user data
Automated Social Engineering
Targeting code completion engines
All of these are completely new but unfortunately it seems more difficult to explain the impact to people than we had anticipated.
The instructions are manipulating the LLM itself. Making it exfiltrate and collect data , fetch new instructions from an attacker etc. All the connected applications can be fine but it's basically turning your assistant into a compromised, attacker-controlled version of itself just because it looked at the wrong news article. From our GitHub:
We demonstrate the potentially brutal consequences of giving LLMs like ChatGPT interfaces to other applications. We propose newly enabled attack vectors and techniques and provide demonstrations of each in this repository:
All of these are completely new but unfortunately it seems more difficult to explain the impact to people than we had anticipated.