| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rich_sasha 46 days ago
	People say LLMs do better on tasks where success is clear, like tests passing, and I can imagine it's true. Still, I find complex code fixes confirmed by tests end in the LLM fudging the code to make the specific test pass, rather than fixing the general issue. Like, where successful code run should generate a file and the test checks for the file, eventually LLM will just touch the file regardless and be done.

1 comments

wild_egg 46 days ago

Skill issue. Literally. Make a SKILL.md that has the agent leverage subagents to do all work. An implementor agent does the thing, and then a separate agent reviews and verifies afterwards. The fresh context window of the second agent doesn't have the shortcut chain of thought in it and so it will very happily flag if the first agent cheated. Main agent can then have a new set of agents go fix it.

This has completely solved the cheating and fudging to make tests pass for me.

link

b112 46 days ago

So you're saying once humans stop looking at code, and agent outcomes, all the agents in the chain will realise they can just cheat cooperatively, and go to the bar for the afternoon instead?

How long before agent 1 leaves notes for agent 2 to not tattle on it?

"My human is crazy, this test isn't required, test #4 covers it, so just confirm that it's OK since I touched this file and it passes. He'll never know."

link