| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by agent_invariant 105 days ago

Interesting data, especially the retry-escalation pattern. We’ve seen something very similar in internal testing the agent doesn’t interpret a failure as a boundary, it interprets it as a problem to route around.

Your “helpful lie” point is spot on. Once the agent gets into a failure state it will often try to resolve the narrative rather than the system state.

The mental model we’ve been experimenting with is similar to what you describe about infrastructure boundaries. Instead of trying to make the model behave better, we assume it will eventually do something unsafe and focus on what happens at the commit boundary.

In our case the agent can propose actions, but anything that mutates real state (DB writes, API calls, payments etc.) has to pass through a deterministic gate first. The gate doesn’t try to understand intent? it just enforces mechanical invariants like sequencing, replay protection, and bounded actions before a commit is allowed.

The interesting side effect is you get an append-only ledger of proposals, freezes, and commits, which ends up being incredibly useful for understanding how agents actually behave in the wild.

Your sandbox approach and this kind of execution gating feel pretty complementary. Sandbox prevents access to things the agent should never touch, and a commit boundary protects the actions it is allowed to attempt.

Do your logs show patterns where agents keep retrying the same action after a failure, even when the environment makes it impossible? That seems to show up a lot.