| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Kevcmk 1170 days ago
	I believe that what is being missed in this thread is that, as it stands, user consent can be forged by prompt injection.

1 comments

TheDong 1170 days ago

There's a clear point where an API call is being made. That point is when a blocking consent prompt could show up.

Like, at worst openAI could "mitm" the prompt's call, and display a pop up modal asking for permission.

I'm not suggestion that you handle this by having the user type "I give permission to call google".

I don't see how it could be possible to forge user consent that is delivered to openAI's servers via a separate mechanism from the model. You'd have to give the LLM a "accept openAI permission prompts" or "run arbitrary javascript in the chatgpt browser session" plugin for it to then be able to use that plugin to bypass modal dialogs for other plugins.

link

TeMPOraL 1170 days ago

There is always one other way left - the usual ways all the scummy companies do this on the web and mobile: make the consent prompt inscrutable, or feel necessary in context, or both.

link

holmesworcester 1170 days ago

Yeah, the malicious prompt injection could be buried in a page of inscrutable text, right? A user wouldn't be able to know what they were saying yes too, unless they could understand and approve each individual API call or operation done by the plugin.

link

TheDong 1170 days ago

> A user wouldn't be able to know what they were saying yes too, unless they could understand and approve each individual API call or operation done by the plugin.

I don't understand this.

The suggestion I had was basically:

User types "Summarize this webpage http://somesite".

ChatGPT pops open a dialog "The language model would like to invoke the 'fetch webpage' plugin: Allow/Abort"

After you allow that, with the injection shown here, there would be another dialog: "The language model would like to invoke the 'Zappier' plugin: Allow/Abort".

Surely OpenAI knows what plugin is being called and can do that, right? Surely that would stop this attack since your original prompt, "summarize a webpage", shouldn't need to invoke a second plugin

This certainly doesn't help with the case where you say "Summarize example.com and email it to me" since you couldn't distinguish between "Send email (good)" and "Send email (spam, due to injection)", but for the attack in this post, it seems like it'd suffice.

link