Hacker News new | ask | show | jobs
by xlii 94 days ago
> We've been running Code Review internally for months: on large PRs (over 1,000 lines changed), 84% get findings, averaging 7.5 issues. On small PRs under 50 lines, that drops to 31%, averaging 0.5 issues. Engineers largely agree with what it surfaces: less than 1% of findings are marked incorrect.

So the take would be that 84% heavily Claude driven PRs are riddled with ~7.5 issues worthy bugs.

Not a great ad of agent based development quality.

2 comments

I ask Claude or codex to review staged work regularly, as part of my workflow. This is often after I’ve reviewed myself, so I’m asking it to catch issues I missed.

It will _always_ find about 8 issues. The number doesn’t change, but it gets a bit … weird if it can’t really find a defect. Part of the art of using the tool is recognizing this is happening, and understanding it’s scraping the bottom of its barrel.

However, if there _are_ defects, it’s quite good at finding and surfacing them prominently.

How many bugs do a human introduce in 1000 line PRs and 50 line PRs?
Zero