Hacker News new | ask | show | jobs
by agrishin 86 days ago
I found that running an agent in ralph loop, showing it the agent text and saying "run this, if it fails - identify the reason, and modify the agent instructions to avoid this, acceptance criteria are this and that" worked surprisingly well. Not sure if it qualifies as a self-referential self improving, but it was something.
1 comments

I'm currently running autoresearch against my harness that autonomously builds SaaS against an enforced architecture, and autoresearch managed to improve the harness performance on my 'time-to-Realworld' benchmark which has Claude Code drive the harness to build an implementation of https://github.com/realworld-apps/realworld with the win condition that it must pass my rigorous postman collection + playwright test suites. Experiments are capped at 90 minutes and the metric it optimises for is calculated from a weighting against number of tests passing, alignment with harness engineering best practices, and time to completion.