Hacker News new | ask | show | jobs
by wild_egg 54 days ago
Skill issue. Literally. Make a SKILL.md that has the agent leverage subagents to do all work. An implementor agent does the thing, and then a separate agent reviews and verifies afterwards. The fresh context window of the second agent doesn't have the shortcut chain of thought in it and so it will very happily flag if the first agent cheated. Main agent can then have a new set of agents go fix it.

This has completely solved the cheating and fudging to make tests pass for me.

1 comments

So you're saying once humans stop looking at code, and agent outcomes, all the agents in the chain will realise they can just cheat cooperatively, and go to the bar for the afternoon instead?

How long before agent 1 leaves notes for agent 2 to not tattle on it?

"My human is crazy, this test isn't required, test #4 covers it, so just confirm that it's OK since I touched this file and it passes. He'll never know."