| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Calavar 57 days ago

I'm writing a compiler. When I have Claude write a new feature, I have validate that suite against a test suite of ~200 tiny programs.

I have a shell script that automates this. If all tests pass, the shell script prints "200/200 passing" with very little token spend. If only 190/200 pass, the shell script reports the names of every test that failed, and now Claude does a process of

1) run the compiler binary -> 2) get assembly output and inspect for obvious errors -> 3) assemble -> 4) verify that the assembler did not report errors -> 5) run test binary, connect with gdb, and find the issue -> 6) edit the compiler source -> 7) recompile the compiler -> 8) back to 1

multiplied by 10 for the 10 failing tests. This eats up tokens very quickly. I realize that not every use case is going to look like this. But if I didn't have Claude verify against the test suite, then I'd be getting regressions left and right, and then what's the point?

The whole codebase (tests included) is less than 15k lines, so I don't think that's the issue. No MCPs. CLAUDE.md about 1.5k lines.