|
|
|
|
|
by wild_egg
54 days ago
|
|
Skill issue. Literally. Make a SKILL.md that has the agent leverage subagents to do all work. An implementor agent does the thing, and then a separate agent reviews and verifies afterwards. The fresh context window of the second agent doesn't have the shortcut chain of thought in it and so it will very happily flag if the first agent cheated. Main agent can then have a new set of agents go fix it. This has completely solved the cheating and fudging to make tests pass for me. |
|
How long before agent 1 leaves notes for agent 2 to not tattle on it?
"My human is crazy, this test isn't required, test #4 covers it, so just confirm that it's OK since I touched this file and it passes. He'll never know."