|
|
|
|
|
by kuri-sun
45 days ago
|
|
Curious what kinds of bugs the multi-agent setup catches thatsingle-pass review misses in practice. Is it more about coverage(different agents looking at different aspects) or about getting a second opinion on the same aspect? The README has examples but the mechanism by which the parallelism actually helps isn't obvious to me from them. |
|
I suppose this would not be a 'real' benchmark because it would be public and so you couldn't necessarily trust scores people share about how their own tool did, but it would at least allow anyone to try out code review tools on their own and report relative effectiveness and characteristics.
I'll post again if I end up finding or building something like that. I couldn't find anything when I looked previously.
I'll also keep in mind your question as I continue testing this, because you are right that it would be useful to be able to describe what is different, not just the magnitude of bugs found.