| I'm concerned that OpenAI's example documentation suggests using this to A) construct SQL queries and B) summarize emails, but that their example code doesn't include clear hooks for human validation before actions are called. For a recipe builder it's not so big a deal, but I really worry how eager people are to remove human review from these steps. It gets rid of a very important mechanism for reducing the risks of prompt injection. The top comment here suggests wiring this up to allow GPT-4 to recursively call itself. Meanwhile, some of the best advice I've seen from security professionals on secure LLM app development is to whenever possible completely isolate queries from each other to reduce the potential damage that a compromised agent can do before its "memory" is wiped. There are definitely ways to use this safely, and there are definitely some pretty powerful apps you could build on top of this without much risk. LLMs as a transformation layer for trusted input is a good use-case. But are devs going to stick with that? Is it going to be used safely? Do devs understand any of the risks or how to mitigate them in the first place? 3rd-party plugins on ChatGPT have repeatedly been vulnerable in the real world, I'm worried about what mistakes developers are going to make now that they're actively encouraged to treat GPT as even more of a low-level data layer. Especially since OpenAI's documentation on how to build secure apps is mostly pretty bad, and they don't seem to be spending much time or effort educating developers/partners on how to approach LLM security. |
At that point, prompt injection is no-longer an issue - because the AI doesn't need to hide anything.
Giving GPT access to your entire database, but telling it not to reveal certain bits, is never going to work. There will always be side channel vulnerabilities in those systems.