Hacker News new | ask | show | jobs
Capabilities Can't See Your Agent's Objective (jlmr.dev)
3 points by jelmersnoeck 17 days ago
1 comments

Reconciling intent has a bootstrap problem: it's inferred from the same model you're constraining, so it rationalizes. Side-effect gates — spend, irreversible writes — can't be talked around.