| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by danShumway 1133 days ago

There's a couple of different stages people tend to go through when learning about prompt injection:

A) this would only allow me to break my own stuff, so what's the risk? I just won't break my own stuff.

B) surely that's solveable with prompt engineering.

C) surely that's solveable with reinforcement training, or chaining LLMs, or <insert defense here>.

D) okay, but even so, it's not like people are actually putting LLMs into applications where this matters. Nobody is building anything serious on top of this stuff.

E) okay, but even so, once it's demonstrated that the applications people are deploying are vulnerable, surely then they'd put safeguards in, right? This is a temporary education problem, no one is going to ignore a publicly demonstrated vulnerability in their own product, right?

4 comments

charrondev 1132 days ago

Honestly the it seems like they play for wiring up an LLM to something can actually take action is to only give the LLM the same access that the same user querying your API would have.

I’ve been exploring an LLM -> API layer for our app and I’m not worried about prompt Injection because if the user was actually malicious they could just used the interface or the API to do the same thing.

In other words if you treat the LLM like any other frontend then you really should have a problem from a security standpoint. Your would have your iOS application super user access your system, why would you treat an LLM different than any other client.

link

simonw 1132 days ago

If you're completely confident that there's no way an attacker might get their text into your user's LLM session then yeah, you have nothing to worry about.

Potential vectors to consider:

- Your app lets users run it against text from other sources - fetched web pages, incoming messages - server logs - which an attacker might be able to influence

- Your users can copy and paste text into your app - and an attacker might be able to trick them into eg copying in a dozen paragraphs of text without first reading it to check for weird hidden prompt instructions

link

ljlolel 1132 days ago

Same as CSRF protections and MacOS random binary from internet running protections.

link

andrelaszlo 1132 days ago

@charrondev

>I’m not worried about prompt Injection because if the user was actually malicious they could just used the interface or the API to do the same thing.

I think you might have missed that the injected prompt might not come from the end user.

There was an example of someone adding a prompt injection to their LinkedIn profile to override a recruiter's prompt and generate an embarrassing email instead. Not sure if it's fake, but it demonstrates the point either way.

link

byteknight 1132 days ago

SQL injection enters the chat

link

danShumway 1132 days ago

I'm a little cautious of comparisons to SQL injection now, because while some of the comparisons are very valid (particularly around the risks), prompt injection isn't really the same category of vulnerability as SQL injection -- so mitigation techniques for SQL injection (escaping input, sanitizing) aren't going to work to stop prompt injection.

But otherwise yeah, it can be helpful to think of prompt injection as if someone is effectively doing XSS on your AI agent (again, keeping in mind that the mitigation techniques are not the same, it's an entirely different method of attack). People tend to think of the jailbreaking examples or getting the agent to swear -- which can be embarassing but also mostly harmless. The reality is that prompt injection is basically arbitrary reprogramming of the agent, and arbitrary insertion of new tasks, and data poisoning/replacement, and data exfiltration, etc...

link

simonw 1132 days ago

Yeah, the confusion between jailbreaking and prompt injection is definitely a big problem.

People who are frustrated at the safety measure that jailbreaking aims to defeat often assume prompt injection is equally "harmless" - they fail to understands that the consequences can be a lot more severe to anyone who is trying to build their own software on top of LLMs.

link

byteknight 1130 days ago

I was referring specifically to the timeline and how there was a sarcastic expectation that they would fix it at a certain stage

link

archgoon 1132 days ago

With a slight modification, this basically applies to just about all security vulns ever :)

link