|
|
|
|
|
by jasonwcfan
1123 days ago
|
|
If I’m understanding correctly, the technique basically injects malicious instructions in the content that is stored and retrieved? Sounds like an easy fix, if it’s possible to detect direct prompt injection attacks then the same techniques can be applied to the data staged for retrieval. |
|
One solution to some indirect prompt injection attacks is proposed in this article, where you "sandbox" untrusted content into a second LLM that isn't given the ability to decide which actions to take: https://simonwillison.net/2023/Apr/25/dual-llm-pattern/