Hacker News new | ask | show | jobs
by danShumway 953 days ago
I'd view this article as an example. I suspect it's not that hard to get a malicous document into someone's drive; basically any information you give to Bard is vulnerable to this attack if Bard then interacts with 3rd-party content. Email agents also come to mind, where an attacker can get a prompt into the LLM by sending an email that the LLM will then analyze in your inbox. Basically any scenario where an LLM is primed with a user's data and allows making external requests, even for images.

Integration between assistants is another problem. Let's say you're confident that a malicious prompt can never get into your own personal Google Drive. But let's say Google Bard keeps the ability to analyze your documents and also gains the ability to do web searches when you ask questions about those documents. Or gets browser integration via an extension.

Now, when you visit a malicious web page with hidden malicious commands, that data can be accessed and exfiltrated by the website.

Now, you could strictly separate that data behind some kind of prompt, but then it's impossible to have an LLM carry on the same conversation in both contexts. So if you want your browsing assistant to be unable to leak information about your documents or visited sites, you need to accept that you don't get the ability to give a composite command like, "can you go into my bookmarks and add 'long', 'medium', or 'short' tags based on the length of each article?" Or at least, you need to have a very dedicated process for that as opposed to a general one, which makes sure that there is no singular conversation that touches both your bookmarks and the contents of each page. They need to be completely isolated from each other, which is not what most people are imagining when they talk about general assistants.

Remember that there is no difference between prompt extraction by a user and conversation/context extraction from an attacker. They're both just getting the LLM to repeat previous parts of the input text. If you have given an LLM sensitive information at any point during conversation, then (if you want to be secure) the LLM must not interact with any kind of untrusted data, or it must be isolated from any meaningful APIs including the ability to make 3rd-party GET requests and it must never be allowed to interact with another LLM that has access to those APIs.

1 comments

Properly sandboxing and firewalling LLMs is going to be the killer app.