|
|
|
|
|
by latentsea
84 days ago
|
|
I'm currently running autoresearch against my harness that autonomously builds SaaS against an enforced architecture, and autoresearch managed to improve the harness performance on my 'time-to-Realworld' benchmark which has Claude Code drive the harness to build an implementation of https://github.com/realworld-apps/realworld with the win condition that it must pass my rigorous postman collection + playwright test suites. Experiments are capped at 90 minutes and the metric it optimises for is calculated from a weighting against number of tests passing, alignment with harness engineering best practices, and time to completion. |
|