| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vichoiglesias 115 days ago

The self-extending part is wild! agent mutating its own tools mid-run makes trajectory fidelity even more make-or-break.

On git hooks: checkpointing state is easy, but if replay isn’t deterministic, bisect over checkpoints is unreliable... you can get different states on replay.

Tool evolution is a brutal test case: if the predicate is “does this tool still handle edge X?”, it needs to stay violated once flipped, or binary search happily lies about the origin tick.

Genuine question: when a self-built tool regresses, can you actually reconstruct the exact chain of reasoning/commitments that led to it? The artifact is simple to diff, the decision trail behind it is where it gets nasty.