| TLDR: With these vulnerabilities, we show the following is possible: - Remote control of chat LLMs - Persistent compromise across sessions - Spread injections to other LLMs - Compromising LLMs with tiny multi-stage payloads - Leaking/exfiltrating user data - Automated Social Engineering - Targeting code completion engines There is also a repo: https://github.com/greshake/llm-security
and another site demonstrating the vulnerability against Bing as a real-world example: https://greshake.github.io/ These issues are not fixed or patched, and apply to most apps or integrations using LLMs. And there is currently no good way to protect against it. |
You can also hook yourself up to the websocket and see that their solution to similar problems of prompt injection, bad speak, etc. is to revoke output of responses. It'll generate, but it has another model watching, and it'll take over once it detects "bad thing" (and end the conversation totally on the front-end. but it'll still keep generating, till about 20 messages in, and then the confabulation gets to be a bit much and/or the context just disappears and it just keeps responding as if it's the first message, with no context.)