|
|
|
|
|
by jjmarr
113 days ago
|
|
I created a code review pipeline at work with a similar tradeoff and we found the cost is worth it. Time is a non-issue. We could run Claude on our code and call it a day, but we have hundreds of style, safety, etc rules on a very large C++ codebase with intricate behaviour (cooperative multitasking be fun). So we run dozens of parallel CLI agents that can review the code in excruciating detail. This has completely replaced human code review for anything that isn't functional correctness but is near the same order of magnitude of price. Much better than humans and beats every commercial tool. "scaling time" on the other hand is useless. You can just divide the problem with subagents until it's time within a few minutes because that also increases quality due to less context/more focus. |
|
> So we run dozens of parallel CLI agents that can review the code in excruciating detail. This has completely replaced human code review for anything that isn't functional correctness but is near the same order of magnitude of price. Much better than humans and beats every commercial tool.
Sure, you could make multiple LLM invocations (different temporature, different prompts, ...). But how does one separate the good comments from the bad comments? Another meta-LLM? [1] Do you know of anyone who summarizes the approach?
[1]: I suppose you could shard that out for as much compute you want to spend, with one LLM invocation judging/collating the results of (say) 10 child reviewers.