Hacker News new | ask | show | jobs
by latentsea 84 days ago
I'm currently running autoresearch against my harness that autonomously builds SaaS against an enforced architecture, and autoresearch managed to improve the harness performance on my 'time-to-Realworld' benchmark which has Claude Code drive the harness to build an implementation of https://github.com/realworld-apps/realworld with the win condition that it must pass my rigorous postman collection + playwright test suites. Experiments are capped at 90 minutes and the metric it optimises for is calculated from a weighting against number of tests passing, alignment with harness engineering best practices, and time to completion.