Hacker News new | ask | show | jobs
by rtpg 3413 days ago
The problem is that review complexity is quadratic in size, and quadratic in number of interacting parts.

Each thing can interact with each other thing (in theory), so if you have double the diff, you might need 4 times the amount of work. And the increased workload increases the risk of a mistake.

But splitting up the diff in two doesn't change the result, right? The reality is that everything doesn't interact with everything else. But the reviewer doesn't know this, and needs to confirm non-interaction (hence a review).

If you, as an implementer, know the divisions, then shipping the pieces that are independent will do two things. First, it makes the review easier, letting it serve its main purpose: catching errors in implementation. Secondly, it will let _you_ confirm you know the divisions. It could be that your conception is wrong! Your small diff might not work because of an interaction you didn't realize.

Personally, I think that by the time you're sending diffs, the general architecture should have been decided. We tend to have at least one person (not the implementor) who knows how a feature is going to be implemented, so the architecture can be confirmed. You want someone to be looking at the forest! I think that doing this during code review is a bit harder though.

Taking the time to do it right is hard, and chopping things up makes it easier to do "it" (implementation review) right. Architecture review should probably be happening in a different space.

On a social level, you are asking for your code to be reviewed. Please have empathy for the reviewer.

2 comments

>Secondly, it will let _you_ confirm you know the divisions.

Absolutely. Which is why I draw the line at "conceptual units". If there are rational sub-divisions that don't require context, go ahead, divide! If you're baking in cleanup + feature + bugfix + convert tabs to spaces, you're mixing things in the same way that you would usually avoid when coding. Same thing with too-large commits - we avoid 10,000 line files and functions, do the same with commits.

I caution only against artificially small diffs. Small is a fine target, but there's a lot of dogma around it everywhere I've been, which is why I think it's worth calling out. Breaking things up too much can remove context, which can be dangerous.

---

If you're introducing a large thing in 1 vs N units (say a full feature in one go, which is unusable until complete), the line is of course fuzzier. In some ways, sub-units may be more reviewable - that's a solid benefit, though they're not likely a dozen or two lines of code (if they are, and it's a large feature, you probably now have dozens of reviews). But you aren't getting any additional safety, because it's all either on or off - which is half of the post.

For the other half of the post, if you're not getting the same reviewer(s) for all the pieces, unless you've had a detailed architectural review[1], has anybody but you read and understood the whole system? When it's turned on and it breaks in a subtly-interacting way and you're on vacation, does someone else know where to look? Doing it all in one go, as it will be when turned on, forces an architectural review of the system implemented, not just the plans. Say it takes a day or two to thoroughly review - is that bad?

---

Always tradeoffs. For the situation described in the blog post, where large reviews get insufficient attention, I don't think small diffs will solve the problem. The problem is the lack of attention given to reviews - fix that. It's critical regardless of diff sizes.

[1]: No place I've worked for has truly done this, which I would claim is the (vast) majority, but I wholly believe some places do. I would probably even like it! But it's perceived as slowing things down, so it hasn't happened for me yet.

Great notes. Thanks for writing this!

Architecture review distinct from deploys is key, as is a separate operability review. And it's also important to mentally decouple the feature release to end users from the code releases.