| There's a couple of different stages people tend to go through when learning about prompt injection: A) this would only allow me to break my own stuff, so what's the risk? I just won't break my own stuff. B) surely that's solveable with prompt engineering. C) surely that's solveable with reinforcement training, or chaining LLMs, or <insert defense here>. D) okay, but even so, it's not like people are actually putting LLMs into applications where this matters. Nobody is building anything serious on top of this stuff. E) okay, but even so, once it's demonstrated that the applications people are deploying are vulnerable, surely then they'd put safeguards in, right? This is a temporary education problem, no one is going to ignore a publicly demonstrated vulnerability in their own product, right? |
I’ve been exploring an LLM -> API layer for our app and I’m not worried about prompt Injection because if the user was actually malicious they could just used the interface or the API to do the same thing.
In other words if you treat the LLM like any other frontend then you really should have a problem from a security standpoint. Your would have your iOS application super user access your system, why would you treat an LLM different than any other client.