Hacker News new | ask | show | jobs
by greshake 1133 days ago
I just published a blog post showing that that is not what is happening. Companies are plugging LLMs into absolutely anything, including defense/threat intelligence/cybersecurity/legal etc. applications: https://kai-greshake.de/posts/in-escalating-order-of-stupidi...
2 comments

There's a couple of different stages people tend to go through when learning about prompt injection:

A) this would only allow me to break my own stuff, so what's the risk? I just won't break my own stuff.

B) surely that's solveable with prompt engineering.

C) surely that's solveable with reinforcement training, or chaining LLMs, or <insert defense here>.

D) okay, but even so, it's not like people are actually putting LLMs into applications where this matters. Nobody is building anything serious on top of this stuff.

E) okay, but even so, once it's demonstrated that the applications people are deploying are vulnerable, surely then they'd put safeguards in, right? This is a temporary education problem, no one is going to ignore a publicly demonstrated vulnerability in their own product, right?

Honestly the it seems like they play for wiring up an LLM to something can actually take action is to only give the LLM the same access that the same user querying your API would have.

I’ve been exploring an LLM -> API layer for our app and I’m not worried about prompt Injection because if the user was actually malicious they could just used the interface or the API to do the same thing.

In other words if you treat the LLM like any other frontend then you really should have a problem from a security standpoint. Your would have your iOS application super user access your system, why would you treat an LLM different than any other client.

If you're completely confident that there's no way an attacker might get their text into your user's LLM session then yeah, you have nothing to worry about.

Potential vectors to consider:

- Your app lets users run it against text from other sources - fetched web pages, incoming messages - server logs - which an attacker might be able to influence

- Your users can copy and paste text into your app - and an attacker might be able to trick them into eg copying in a dozen paragraphs of text without first reading it to check for weird hidden prompt instructions

Same as CSRF protections and MacOS random binary from internet running protections.
@charrondev

>I’m not worried about prompt Injection because if the user was actually malicious they could just used the interface or the API to do the same thing.

I think you might have missed that the injected prompt might not come from the end user.

There was an example of someone adding a prompt injection to their LinkedIn profile to override a recruiter's prompt and generate an embarrassing email instead. Not sure if it's fake, but it demonstrates the point either way.

SQL injection enters the chat
I'm a little cautious of comparisons to SQL injection now, because while some of the comparisons are very valid (particularly around the risks), prompt injection isn't really the same category of vulnerability as SQL injection -- so mitigation techniques for SQL injection (escaping input, sanitizing) aren't going to work to stop prompt injection.

But otherwise yeah, it can be helpful to think of prompt injection as if someone is effectively doing XSS on your AI agent (again, keeping in mind that the mitigation techniques are not the same, it's an entirely different method of attack). People tend to think of the jailbreaking examples or getting the agent to swear -- which can be embarassing but also mostly harmless. The reality is that prompt injection is basically arbitrary reprogramming of the agent, and arbitrary insertion of new tasks, and data poisoning/replacement, and data exfiltration, etc...

Yeah, the confusion between jailbreaking and prompt injection is definitely a big problem.

People who are frustrated at the safety measure that jailbreaking aims to defeat often assume prompt injection is equally "harmless" - they fail to understands that the consequences can be a lot more severe to anyone who is trying to build their own software on top of LLMs.

I was referring specifically to the timeline and how there was a sarcastic expectation that they would fix it at a certain stage
With a slight modification, this basically applies to just about all security vulns ever :)
Yes, but most companies aren’t allowing unfettered access to promoting, either.

My insider risk — a developer who attempts to extract training data, a LLM being leaked of internal data, or an employee who wants to break the prompt for competitive gain — is a lot different of a threat than allowing all of my customers a tool to query their data using LLM’s.