Capabilities Can't See Your Agent's Objective

Y	Hacker News new \| ask \| show \| jobs

	Capabilities Can't See Your Agent's Objective (jlmr.dev)
	3 points by jelmersnoeck 17 days ago

1 comments

hiroto_lemon 17 days ago

Reconciling intent has a bootstrap problem: it's inferred from the same model you're constraining, so it rationalizes. Side-effect gates — spend, irreversible writes — can't be talked around.

link