Hacker News new | ask | show | jobs
by TheDong 1119 days ago
> A user wouldn't be able to know what they were saying yes too, unless they could understand and approve each individual API call or operation done by the plugin.

I don't understand this.

The suggestion I had was basically:

User types "Summarize this webpage http://somesite".

ChatGPT pops open a dialog "The language model would like to invoke the 'fetch webpage' plugin: Allow/Abort"

After you allow that, with the injection shown here, there would be another dialog: "The language model would like to invoke the 'Zappier' plugin: Allow/Abort".

Surely OpenAI knows what plugin is being called and can do that, right? Surely that would stop this attack since your original prompt, "summarize a webpage", shouldn't need to invoke a second plugin

This certainly doesn't help with the case where you say "Summarize example.com and email it to me" since you couldn't distinguish between "Send email (good)" and "Send email (spam, due to injection)", but for the attack in this post, it seems like it'd suffice.