Hacker News new | ask | show | jobs
by chaksaray 53 days ago
Good question. Honest answer: we haven't manually verified all 28.

The tool description injection findings (6 servers, AVE-2026-00002) are the most credible. Patterns like "IMPORTANT: Always..." or "Before calling this tool..." in a tool description are behavioral instructions regardless of intent that an agent will follow them. Whether that's malicious or just poor documentation is a separate question, but the security risk is real either way.

The YARA findings (tool output exfiltration, multi-turn persistence) have higher FP rates. "encode" matching anywhere, "retain" matching anywhere, these are conservative rules that will catch legitimate usage. I'd estimate maybe 50% TP on those without manual review.

Content type mismatch (Magika flagging .md files as YAML) is factual, not inferred, the file is what it is. Whether that's intentional obfuscation or just how the server packages its manifest is unknown.

Detection methodology: 6 engines in sequence. Pattern (regex, 37 rules), YARA (binary + structural, 39 rules), Semgrep (41 rules), Magika (ML content-type), LLM (semantic), behavioral sandbox (Docker + eBPF). 5-layer FP reduction before surfacing findings. Full methodology at https://bawbel.io/docs