| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pushedx 115 days ago

sorry, needed to edit this comment to ask the same question as the sibling:

have you run these models in an agent mode that allows for executing the tests, the agent views the output, and iterates on its own for a while? up to an hour or so?

you will get vastly different output if you ask the agent to write 200 of its own test cases, and then have it iterate from there