Y
Hacker News
new
|
ask
|
show
|
jobs
Capabilities Can't See Your Agent's Objective
(
jlmr.dev
)
3 points
by
jelmersnoeck
17 days ago
1 comments
hiroto_lemon
17 days ago
Reconciling intent has a bootstrap problem: it's inferred from the same model you're constraining, so it rationalizes. Side-effect gates — spend, irreversible writes — can't be talked around.
link