| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by btown 331 days ago

Even if this is a fabricated system, there are all sorts of sensitive things that might be made accessible to an LLM that is fed user-generated data.

For instance, say you have an internal read-only system that knows some details about your proprietary vendor relationships. You wire up an LLM with an internal MCP server to "return the ID and title of the most appropriate product for a customer inquiry." All is well until the customer/attacker submits a form containing text that looks like the JSON for MCP back-and-forth traffic, and aims to exfiltrate your data. Sure, all that JSON was escaped, but you're still trusting that the LLM doesn't get confused, and that the attention heads know what's real JSON and what's fake JSON.

We know not to send sensitive data to the browser, no matter how obfuscated or obscure. What I think is an important mental model is that once your data is being accessed by an LLM, and there's any kind of user data involved, that's an almost equally untrusted environment. You can mitigate, pre-screen for prompt injection-y things, but at the end of the day it may not be enough.