| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ZekiAI2026 99 days ago

Interesting gap to explore: Sentrial catches drift and anomalies -- failures that happen by accident. What's the defense against failures that happen by design?

Prompt injection is the clearest example: an attacker embeds instructions in content your agent processes. The agent does exactly what it's told. No wrong tool invocations, no hallucinations in the traditional sense -- just an agent successfully executing injected instructions. From a monitoring perspective it looks like normal operation.

Same with adversarial inputs crafted to stay inside your learned "correct" patterns: tool calls are right, arguments are plausible, outputs pass quality checks. The manipulation is in what the agent was pointed at, not in how it behaved.

Curious whether your anomaly detection has a layer for adversarial intent vs. operational drift, or whether that's explicitly out of scope for now.