Hacker News new | ask | show | jobs
by wongarsu 52 days ago
See also https://marginlab.ai/trackers/claude-code-historical-perform... for a more conventional approach to track regressions

This project is somewhat unconventional in its approach, but that might reveal issues that are masked in typical benchmark datasets