| There's a crucial difference here. When you were working as a pentester, how often did you find a security hole and report it and the response was "it is impossible for us to fix that hole"? If you find an XSS or a SQL injection, that means someone made a mistake and the mistake can be fixed. That's not the case for prompt injections. My favorite paper on prompt injection remedies is this one: https://arxiv.org/abs/2506.08837 Two quotes from that paper: > once an LLM agent has ingested untrusted input, it must be constrained so that it is impossible for that input to trigger any consequential actions—that is, actions with negative side effects on the system or its environment. The paper also mentions how detection systems "cannot guarantee prevention of all attacks": > Input/output detection systems and filters aim to identify potential attacks (ProtectAI.com, 2024) by analyzing prompts and responses. These approaches often rely on heuristic, AI-based mechanisms — including other LLMs — to detect prompt injection attempts or their effects. In practice, they raise the bar for attackers, who must now deceive both the agent’s primary LLM and the detection system. However, these defenses remain fundamentally heuristic and cannot guarantee prevention of all attacks. |