Hacker News new | ask | show | jobs
by greshake 1212 days ago
No, the "malware" is running on the language model itself. It does not need to inject any code into connected applications and exploit them to be itself exploited (I'm the main author).
1 comments

What do you mean by "it is running on LLM?". I still don't get it. Can you clarify it even more?
I'll start with a quote from gwern on LessWrong: "... a language model is a Turing-complete weird machine running programs written in natural language; when you do retrieval, you are not 'plugging updated facts into your AI', you are actually downloading random new unsigned blobs of code from the Internet (many written by adversaries) and casually executing them on your LM with full privileges. This does not end well."

The instructions are manipulating the LLM itself. Making it exfiltrate and collect data , fetch new instructions from an attacker etc. All the connected applications can be fine but it's basically turning your assistant into a compromised, attacker-controlled version of itself just because it looked at the wrong news article. From our GitHub:

We demonstrate the potentially brutal consequences of giving LLMs like ChatGPT interfaces to other applications. We propose newly enabled attack vectors and techniques and provide demonstrations of each in this repository:

    Remote control of chat LLMs

    Persistent compromise across sessions

    Spread injections to other LLMs

    Compromising LLMs with tiny multi-stage payloads

    Leaking/exfiltrating user data

    Automated Social Engineering

    Targeting code completion engines
All of these are completely new but unfortunately it seems more difficult to explain the impact to people than we had anticipated.