|
|
|
|
|
by josephg
110 days ago
|
|
> Even with humans we hire managers to align the activity of subordinates, keeping intent and work in sync. We do this socially too. From a very young age, children teach each other what they like and don't like, and in that way mutually align their behaviour toward pro social play. > I find that running judge agents on plans before working and on completed work helps a lot How do you set this up? Do you do this on top of the claude code CLI somehow, or do you have your own custom agent environment with these sort of interactions set up? |
|
I am actively thinking about task.md like a new programming language, a markdown Turing machine we can program as we see fit, including enforcement of review at various stages and self-reflection (am I even implementing the right thing?) kind of activity.
I tested it to reliably execute 300+ gates in a single run. That is why I am sending judges on it, to refine it. For difficult cases I judge 3-4 times before working, each judge iteration surfaces new issues. We manually decide judge convergence on a task, I am in the loop.
The judge might propose bad ideas about 20% of the time, sometimes the planner agent catches them, other times I do. Efficient triage hierarchy: judge surfaces -> planner filters -> I adjudicate the hard cases.