|
I can offer a hacking/penetrarion testing perspective to this as a security researcher at a security consultint firm: this type of hallucination and trust is one of the largest things we exploit in our new LLM testing service. Overly agentic systems (one of the top 10 OWASP LLM vulns) is the most profound and commonly exploited issue that we've been able to leverage. If we can get an internal, sensitive-data-handling agent to ingest a crafted prompt, either via direct prompt injection against a more abstract “parent” agent, or by tainting an input file/URL it’s told to process, we can plant what I have internally coined an “unfolding injection.” The injection works like a parasitic goal, it doesn’t just trick one agent, it rewrites the downstream intent. As the orchestrator routes tasks to other agents, each one treats the tainted instructions as legitimate and works toward fulfilling them. Because many orchestrations re-summarize, re-plan, or synthesize goals between steps, the malicious instructions can actually gain fidelity as they propagate. By the time they reach a sensitive action (exfiltration, privilege escalation, external calls), there’s no trace of the original “weird” wording, just a confidently stated, fully-integrated sub-goal. It’s essentially a supply-chain attack on the orchestration layer: you compromise one node in the agent network, and the rest “help” you without realizing it. Without explicit provenance tracking and policy enforcement between agents, this kind of unfolding injection is almost trivial to pull off, and we've been able to compromise entire environments based on the information the agentic system provided us, or just gave us either a bind or reverse shell in the case it has cli access and ability to figure out its own network constraints. SSRF has been making a HUGE return in agentic systems, and Im sad defcon and black hat didnt really have many talks on this subject this year, because it is a currently evolving security domain and entirely new method of exploitation. The entire point of agentic systems is non determinism, but it also makes it a security nightmare. As a researcher though, this is basically a gold mine of all sorts of new vulnerabilities we'll be seeing. If you work as a bugbounty hunter and see a new listing for an AI company I can almost assuredly say you can get a pretty massive payout just by exploiting the innate trust between agents and the internal tools they are leveraging. Even if you dont have the architecture docs of the agentic system you can likely prompt inject the initial task enough to taint the further agents to have them list out the orchestration flow by creatively adjusting your prompt for different types of orchestration and how the company might be doing prompt engineering on the agents persona and task its designed to work on and then submit report on to parent agent, and the limited input validation between them. |