|
|
|
|
|
by mrothroc
107 days ago
|
|
I ended up building my own for this. SQLite backend, breaks work into stages with gates between them. A gate checks each handoff before the next stage starts. Does the code match the spec, did the tests pass, that kind of thing. I've been running it with Claude Code for about four months now. What I didn't expect was how much the gates matter relative to the model itself. Most of the failures I see aren't hallucinations, they're omissions, and a structured check catches those easily. I wrote up how I did it and how to start building one yourself from markdown and a shell script: https://michael.roth.rocks/research/543-hours/#10 |
|