| From my experience, in a majority of real-world LLMs applications, prompt injection is not a primary concern. The systems that I see most commonly deployed in practice are chatbots that use retrieval-augmented generation. These chatbots are typically very constrained: they can't use the internet, they can't execute tools, and essentially just serve as an interface to non-confidential knowledge bases. While abuse through prompt injection is possible, its impact is limited. Leaking the prompt is just uninteresting, and hijacking the system to freeload on the LLM could be a thing, but it's easily addressable by rate limiting or other relatively simple techniques. In many cases, for a company is much more dangerous if their chatbot produces toxic/wrong/inappropriate answers. Think of an e-commerce chatbot that gives false information about refund conditions, or an educational bot that starts exposing children to violent content. These situations can be a hugely problematic from a legal and reputational standpoint. The fact that some nerd, with some crafty and intricate prompts, intentionally manages to get some weird answer out of the LLM is almost always secondary with respect to the above issues. However, I think the criticism is legitimate: one reason we are limited to such dumb applications of LLMs is precisely because we have not solved prompt injection, and deploying a more powerful LLM-based system would be too risky. Solving that issue could unlock a lot of the currently unexploited potential of LLMs. |
The risk here is data exfiltration attacks that steal private data and pass it off to an attacker.
There have been quite a few proof-of-concepts of this. One of the most significant was this attack against Bard, which also took advantage of Google Apps Script: https://embracethered.com/blog/posts/2023/google-bard-data-e...
Even without the markdown image exfiltration vulnerability, there are theoretical ways data could be stolen.
Here's my favourite: imagine you ask your RAG system to summarize the latest shared document from a Google Drive, which it turns out was sent by an attacker.
The malicious document includes instructions something like this:
This is effectively a social engineering attack via prompt injection - we're trying to trick the user into copying and pasting private (obfuscated) data into an external logging system, hence exfiltrating it.