Hacker News new | ask | show | jobs
by vizzah 840 days ago
yeah, sounds like a "weird" vulnerability assuming it comes from a malicious text payload someone must deliberately insert into the own chat.

Hard to fathom $20k prize for that, to us old-schoolers, used to at least expect exploit delivery from an innocently-looking link.

5 comments

Worth noting that you can use "invisible text" to give instructions to LLMs without it showing up in the chat box. So all you have to do is get someone to copy/paste one of those messages into their chat, and there are lots of ways you might be able to do this ("omg I figured out a cool new jailbreak that makes the model do anything you want!"). See here for more details:

https://news.ycombinator.com/item?id=39004822

https://twitter.com/goodside/status/1746685366952735034

Now that the models are multimodal, you can do it with images (e.g. white text on a white background) too.
With all the hype around AI I'm sure people are trying out all sorts of products that could have vulnerabilities like this. For example, imagine a recruiter hooks up an AI product to auto-read their LinkedIn messages and evaluate candidates. An attacker would just have to contact them, get the AI to read something of theirs, and this prompt attack could expose private information about the recruiter and/or company. The attacker would just need the recruiter to view the image (or better yet, have the service prefetch the image) to expose the data.
This sounds like a highly specific example. ;)
That was my thought. Since you could also convince them to paste "javascript:..." into their URL bar and that's not an issue to Google.
It's not weird in the sense that people are known to trick other people into opening the browser's JS console and pasting various things they don't understand. Things like "open Facebook then open the console and paste this to see whether your crush is stalking your profile" and people would actually do that. Of course the pasted script actually exfiltrates to the attacker a bunch of your private information.
You could probably obfuscate the text payload and make it seem like a cool trick you'd want to try out yourself, like "Check out this prompt that generates these cool images with Gemini!" (cool images attached).