|
|
|
|
|
by bluemario
102 days ago
|
|
The "human in the loop at key checkpoints" pattern has been the most practically useful for us. We found that giving the agent full autonomy end-to-end produces subtly broken code that passes tests but violates implicit invariants you never thought to write down. Short loops with a human sanity check at decision forks catches that class of failure early. The thing I keep wrestling with is where exactly to place those checkpoints. Too frequent and you've just built a slow pair programmer. Too infrequent and you're doing expensive archaeology to figure out where it went sideways. We've landed on "before any irreversible action" as a useful heuristic, but that requires the agent to have some model of what's irreversible, which is its own can of worms. Has anyone found a principled way to communicate implicit codebase conventions to an agent beyond just dumping a CLAUDE.md or similar file? We've tried encoding constraints as linter rules but that only catches surface stuff, not architectural intent. |
|
I wanted something I could use to objectively decide if one test (or gate, as I call them) is better than another, and how do they work as a holistic system.
My personal tool encodes a workflow that has stages and gates. The gates enforce handoff. Once I did this I went from ~73% first-pass approval to over 90% just by adding structured checks at stage boundaries.
My hope is that we can have a common vocabulary to talk about this, so I wrote up the data and the framework that fell out of it: https://michael.roth.rocks/research/trust-topology/