There's a clear point where an API call is being made. That point is when a blocking consent prompt could show up.
Like, at worst openAI could "mitm" the prompt's call, and display a pop up modal asking for permission.
I'm not suggestion that you handle this by having the user type "I give permission to call google".
I don't see how it could be possible to forge user consent that is delivered to openAI's servers via a separate mechanism from the model. You'd have to give the LLM a "accept openAI permission prompts" or "run arbitrary javascript in the chatgpt browser session" plugin for it to then be able to use that plugin to bypass modal dialogs for other plugins.
There is always one other way left - the usual ways all the scummy companies do this on the web and mobile: make the consent prompt inscrutable, or feel necessary in context, or both.
Yeah, the malicious prompt injection could be buried in a page of inscrutable text, right? A user wouldn't be able to know what they were saying yes too, unless they could understand and approve each individual API call or operation done by the plugin.
> A user wouldn't be able to know what they were saying yes too, unless they could understand and approve each individual API call or operation done by the plugin.
ChatGPT pops open a dialog "The language model would like to invoke the 'fetch webpage' plugin: Allow/Abort"
After you allow that, with the injection shown here, there would be another dialog: "The language model would like to invoke the 'Zappier' plugin: Allow/Abort".
Surely OpenAI knows what plugin is being called and can do that, right? Surely that would stop this attack since your original prompt, "summarize a webpage", shouldn't need to invoke a second plugin
This certainly doesn't help with the case where you say "Summarize example.com and email it to me" since you couldn't distinguish between "Send email (good)" and "Send email (spam, due to injection)", but for the attack in this post, it seems like it'd suffice.
Like, at worst openAI could "mitm" the prompt's call, and display a pop up modal asking for permission.
I'm not suggestion that you handle this by having the user type "I give permission to call google".
I don't see how it could be possible to forge user consent that is delivered to openAI's servers via a separate mechanism from the model. You'd have to give the LLM a "accept openAI permission prompts" or "run arbitrary javascript in the chatgpt browser session" plugin for it to then be able to use that plugin to bypass modal dialogs for other plugins.