Hacker News new | ask | show | jobs
Large pull requests slow down development (graphite.dev)
62 points by k_dumez 937 days ago
17 comments

Where I work it takes several hours to get a PR past all the merge gates. If I have 5 small PRs that all build one on top of the other, it can take me the better part of 2 weeks to get them all merged, counting all the context-switching, tweaking based on comments and relaunching of the merge checks, etc.

The only way developers are able to make any kind of velocity in their work is to babysit their PRs morning, evening, and night. "Time for bed. Let me crack open my laptop and check to see if my PR made it past the merge checks. Oops, $RANDOM_CHECK failed because the server that runs it was borked. Let me kick that off again and hope it works overnight so first thing in the morning before I leave for the office I can check again to see if there's some other random failure I can fix and then re-kick off the merge checks so they can run while I'm commuting into the office."

In spite of having it broken into 12 neat commits, a reviewer demanded that I break one of my PRs into multiple PRs. I don't work outside of standard business hours, so I've been iterating on it for the past 4 weeks now in between meetings, design reviews, interviews, etc. Meanwhile a team of about 4 people have been blocked waiting for the full patchset to fully go in.

If instead it were one big PR with multiple commits as I originally had it, I can have gotten it merged in a few days.

I mean this is mostly just a devops issue though no? PR’s used to take a solid week to merge at my company with a 10 hour CI that had a ~50% success rate when the code worked. But we put a bunch of resources toward build times and ci run time and now it mostly works. Yea, you can circumvent a shitty pr system by just making massive pr’s but wouldn’t it be easier to have the devops team fix your broken system?
Depends on what your code does. A lot of my code runs robots. You can't scale horizontally because you can only fit so many systems in a room before you run afoul of production limits, safety requirements or cooling capacity. You can't scale vertically because zoning. You can't run the tests faster because physics.

So, you either run the expensive tests asynchronously and all that entails or you allow CI to take forever.

In that case, would it make more sense to have a small PR that you start with, get it reviewed, then open smaller PRs to your original PR? That way your approach/code can be reviewed while in your larger PR you tweak and fix tests.
Not as a (good) general solution. Some things are just hard to break down into smaller logical units. E.g. a complicated new driver might introduce thousands of lines of changes. Half a driver usually won't compile, but you could break things down into PRs over individual functions (in call order so there aren't compiler warnings). That's a wildly unnecessary amount of effort though. You could write a skeleton driver first too, but that's rote declarations barely worth reviewing even as part of a larger PR.
> devops team

Ah… about that

Basically the top antipattern in the book
Isn't the problem here the flaky and long running PR gates? There's room for nuance to say "small PRs are a north star but large PRs are sometimes OK too"
You must work at a big tech company. Thats utter insanity.
Well okay, yes, if your process is that broken then you have to make do. In shops with a vaguely reasonable path to merge a PR, time for human reviewers to sign off dominates and smaller changes are the expedient path to get changes merged quickly.
I'm writing a book, "Bytes Of Wisdom: From Start To Merged" that focuses around code reviews.

I agree that larger PRs can increase the latency of getting feedback and getting quality feedback. I also have another view.

Most code reviews struggle not from the reviewing part and shockingly not the authoring part, but the laying groundwork part. The laying the groundwork part is before any hands to the keyboard to write actual implementation. This part focuses on explaining what is trying to be solved and for what problem. It is bringing everyone who would like to participate in the conversation up to speed on terminology, background, understanding of past decisions and the different proposals looked at, ruled out, etc.

I've seen too many PRs ask for reviews where the reviewer had to read the description half a dozen times and spend an additional hour understanding other systems that interact with the diff to be able to even attempt to discuss if the implementation solution is even the "correct" implementation that should be tackled for the problem at hand.

IMO, there are four key parts to better quality PRs, laying the groundwork, authoring the technical implementation, reviewing and feedback.

Feel like this is complicated by the fact that a lot of the orgs I worked followed Agile and had cards w/ the work divided up in weird ways. A lot of times I was doing shit just b/c the card said so, if I had more autonomy I would of been doing a totally different subsection of work.
I agree with you for significant changes. But most day to day programmers are working on large and/or mature codebases, and I would think that this level of groundwork should not be required for a typical change. Indeed I would hope I don't need to bring multiple stakeholders along on the journey in order to add a couple of columns to a report or upgrade an API schema version.
I understand and it is a bit of a struggle I'm facing in the writing to have the reader believe that a large part of what is holding them back is this very thing. A lot of my technical audience has the sense of, I should just be able to make my change and shouldn't need multiple eyes along the journey for feedback.

I had an experience very similar to your, "just add a couple of columns". That experience which I'll explain was the final straw to getting me to start drafting the book.

An individual at an enterprise company made a PR to change the code from handling a single item to a list of items. Of course this showed promising performance because it reduced a bunch of the overhead operations that used to be O(N) down to some sub N amount.

It took me about 3 hours to fully understand the surrounding systems. At that point I did not review the implementation, BUT instead the problem. A system was falling behind and timing out. This implementation did help fix the issue by batching API calls. While I as the reviewer did the "laying groundwork" portion I realized that a lot of the API calls themselves did not need to be triggered in the first place.

Looking purely at the implementation for review was a LGTM (Looks Good To Me) kind of review because I could not weigh out the solutions to the problem without the information on the systems to understand where the problem could be.

Well that sounds like a good read! I've been privileged to spend my 10+ years writing software among pair-programmers, and have never been made to suffer from this "code review" ritual. Maybe it's not so bad? I'd love to read a book about it before my luck runs out and somebody foists it upon me. Just to be prepared, you know.

Seems to me like it just can't work. Show a programmer a 5 line program, as the adage goes, and get 5 critiques. Show her a 500 line program, and get a thumbs up; "looks good!"

Really curious to know what keeps professionals from resorting to that timeless adage. Besides sheer chutzpah, how do folks muster the gumption to tell their peers what should have been tackled? Or, do I have my hat on backwards?

Code review is not a "ritual". Our code is our output, and it is paid work. Most professions have some level of review. Go speak to a chartered accountant or an engineer and complain about the "ritual" of peer review and see how far you get.

We are professionals and it's about time we started acting like it.

Of course code review is a ritual. All professionalism is. We establish norms and rites the performance of which increases shared understanding and predictable outcomes. I am not sure I understand what malign connotation this word carries for you.
I suppose I agree but I don't think it's really accurate.

You can make large, easy to review PRs but if the commit history isn't useful then you have to view it all in one big chunk.

If instead you take a note out of the patchset workflow and make each commit it's own fairly discrete and digestible change after you wrap up development on the feature, then reviewers can just start on commit 1/N and review each one as if it was its own mini PR.

A 10k LOC PR is going to be a pain to review if it's treated as one big commit but if it's broken into a bunch of <100 to 500 LOC commits, each with a descriptive title, a useful message, and functioning code/tests, then even if reviewing takes a long time, you can review and test each discrete chunk fairly quickly without too much cognitive load. And then as you keep making revisions, the changes per commit revision get smaller and may only take a few minutes to review, even if it is a 1k+ LOC revision.

I am not exaggerating when I say that if I had the power to do so, if you gave me a 10,000 line PR I would fire you on the spot. As far as I'm concerned that is professional negligence.

(Obviously I make exceptions for things like large scale code generation or automated refractors).

Yeah I'd agree if there hasn't been prior review and it's not new code. I normally would set my hard limit around 1-2k LOC and I'm expecting quite a few commits to break that up into something digestible or else it's an instant reject.

But for new code? I'm okay with more LOC than normal but I think 10k lines is my hard limit (including tests and documentation). And I expect no more than 500 lines in any given commit/patch. And realistically I am expecting that patchset/PR to be around 50-100 discrete commits/patches in length. Also it's likely that once things start stabilising that patchset can be broken up and merged in chunk by chunk.

And with prior reviews (i.e. merging an incubation branch into mainline) I expect there to be effectively no changes other than merge resolutions to aggregate PRs/patchsets into the incubation branch that is getting pulled into mainline.

Most people are not that good at git from my experience to be able to do this.
I agree but only because we don't expect people to be.

We expect clean, readable code so we develop engineers who can write that code.

If we expect clean, readable commit histories (like any project on the lore. https://lore.kernel.org/), we can develop engineers who produce those commit histories.

There's an adoption cost but IMHO it's worth it.

And tools that facilitate and encourage that behavior.

I’ve tried to do this but the defaults on GH means even if I stack 5 commits each doing atomic things nobody reviews it as anything other than a single patch. And if I split them into distinct PRs the last one looks like the giant PR. And we’re in now I’m shepherding 5 PRs.

Until we have tools or at the very least processes that encourage that behavior nobody will do it.

And don’t get me started on commit message display.

I believe that there's certain things (particularly ambitious net new feature development) that isn't served well by artificially minimizing PR size.

You should absolutely strive to keep smaller PRs, but I've frequently seen this become "you should only have small PRs."

> artificially minimizing PR size

Not sure I understand the "artificial" part here. There's nothing "artificial" about breaking up your larger changes into smaller PRs. It's just good practice.

Helps reviewers who are reviewing the code, and helps the author be more focused with their changes.

Even in net new feature development it's a good idea to break up your large changes to something more manageable.

Sorry if I'm not understanding, what do you believe the downside to be?

It's artificial to break up a PR to satisfy the rule of small PRs.

Often you need the full context when evaluating a new feature end to end.

Or you spend 2 days splitting up a PR into smaller PRs so that a person can review it in 30 minutes instead of 2 hours.

I can't say I've ever seen benefit from it both as a reviewer or as a developer, but it could be an effect of different companies and different teams.

I'm not sure I agree? If your code isn't trivial to compartmentalize changes then that might be a code smell.

I'd agree keeping the unified context is preferable but it's probably easier to do that by having developers rebase their changes into discrete commits that can be reviewed one by one.

It's not necessarily about the units of code, it's about the bigger picture.

Let's give a hypothetical situation where you split up a PR by the backend and frontend components, but you have some extra fields and endpoints that are not used in your final product. They accidentally get left in because you miss them.

I believe the chance that it would be caught in review is significantly higher in a unified PR review instead of artificially splitting front and back into separate PRs.

And if you need to have the 2 PRs open side by side, then why split it up in the first place?

> And if you need to have the 2 PRs open side by side, then why split it up in the first place?

I agree however I think PR's should be split up. Just not into separate PRs. The solution is to actually start reviewing PRs by commit (you can do this in the PR web interface). Everything is split up and can be reviewed separately but you can still see the final diff and while each discrete unit is reviewable, the greater feature is also still one discrete reviewable item.

> "They accidentally get left in because you miss them."

In my experience, this kind of thing happens much more frequently with large PRs.

If a PR is large enough, things will slip through the cracks. If there are too many change requests, even if they're sensible and make sense, things will slip through the cracks. If there's the need for multiple checks by the reviewer, even more things will slip through the cracks.

Also, breaking up PRs in frontend vs backend is not the best idea. Build the feature iteratively if possible.

EDIT: Since the grandparent is talking about reviewing by commit: if those are more reviewable, just use those as the way to separate PRs instead, maybe?

What is the advantage of breaking up a large change into multiple small PRs, compared to multiple small independently-reviewable commits as part of a single PR?

If a PR introduces a new function that will be used in the next commit, I would much rather see that next commit using git, than hunt for it in the PR queue.

To me in a lot of cases it seems even as a reviewer, it's harder to understand big picture if someone splits up the PRs.

But as a code writer myself, for example, if I am building a new feature, firstly it's really hard for me to know what the whole thing would look like without going through it all and it's probably very iterative process as I'm doing it, so I usually wouldn't be able to split it up or it would very suboptimal to split it up before I've finished everything.

Then I would try to split it up as I've finished to appease reviewers, but again, it requires whole lot of creativity to do. Should I try to split up shared component first? Because I surely can't split up whatever is using those shared components. If I do then, people won't see whatever is implementing those shared components so they won't have understanding on why those shared components provide certain functionality etc.

Overall it complicates a lot it seems because if I was to do it during my iterative process then I would write a PR, later refactor bunch of it anyway, and I would do it in the order that feels best for me, but wouldn't necessarily be easy to understand for anyone not within it.

Honestly "you should only have small PRs" is one of those cultural things more people should invest in. Small changes really are both faster and less bug-prone.

To make it work, you have to think differently and code differently. You have to think in advance: "does what I want to do require changing a lot?" If so, think about how else you can solve your problem. Or when implementing something for the first time, think about how difficult it would be for someone to change it.

You end up building things with high cohesion, low coupling, object factories, etc. It makes for very different code that is more maintainable, flexible, easier to change. You end up not needing a big PR, or the changes are to a single high-cohesion component so it's much easier to review than changing 10 different components.

Net new feature development is often 20% or less of the job. With most guidelines like this, it's always possible to find exceptions -- but understanding why you're making the tradeoff in that situation is key.

With that said, using methods like stacking and feature flagging, I've been finding even new feature development to be possible while keeping my PRs to roughly 5 files changed or less.

I don't care much about PR size if it consists of atomic commits. This way I can focus on reviewing each commit separately, and can see the global picture from the overall diff.

Of course, past a certain size reviewing a large change becomes tiring (~500 LOCs IME), so in these cases I ask authors to create separate PRs where each one is focused on a single thing (refactor, fix, feature, etc.). This way review duty could be split across the team as well.

And then there are cases where large PRs are unavoidable, so we deal with it. But atomic commits are a must in all cases.

Also, please don't automatically squash-merge PRs! If the history is messy, clean it up, but leave atomic commits behind, and do a merge commit, or rebase, if applicable. I've found this to be a controversial topic, for some reason.

The key is to define the term "large" in this context. Pull requests size is an artifact of a humans brain ability to process changes. Therefore its best to tailor the content of a PR based on how difficult you think it is for the stakeholders you are merging can understand the change. "large" is more a measure of complexity.

When I make changes to a codebase that very few people are experts in and is generally perceived to be difficult to understand, I make "smaller" pull requests so that the change log is followable and changes easier revert.

As with many things in software development: be pragmatic, think of others, and do the best you can.

> When I make changes to a codebase that very few people are experts in and is generally perceived to be difficult to understand,

Not trying to be snarky, genuinely curious, is this a situation you find yourself in very often? Are you describing something while working at a company or in an open-source project?

Yeah I will always advocate for smaller, more easily digestible PRs - but the underlying principle is good old “know your audience.”
This article implicitly assumes that a single PR corresponds to a single commit. It doesn't touch on the idea of multiple commits in a single PR.

Consider adding a feature. I might structure this as three commits:

1. Refactor in a way that preserves existing behavior but prepares for the new feature

2. Implement the feature, creating a user-visible change

3. Remove old code rendered unnecessary with the new feature

Each of these commits is independently reviewable, passes all lints and tests, and supports bisecting. Small, separate commits like this is good.

But there's no rationale for three small, separate PRs, because they are not independent. Either all the commits should land, or none of them. If we later revert, we would revert them all. And review is made easier by seeing how the commits relate to each other.

I suspect that structuring PRs as multiple independent commits requires a certain level of git history-grooming which is difficult to learn. If git's interactive rebase were more first-class, we would have fewer, larger, and better-structured PRs.

> This article implicitly assumes that a single PR corresponds to a single commit.

It doesn't?

As for the rest of your comment, this is why we have feature flags. What you're describing should be a single PR. I don't care how many commits it is.

*Large pull requests slow down code review up to a point where large enough pull requests promote less thorough and quicker review

Spared you 5 minutes

Also it’s (indirectly) an ad for a product that they’re selling to help devs achieve this.
I can attest that super large PRs get through faster due to review fatigue.

I suspect they still slow down development in the long run though due to more bugs and less theory building among the team

There is one place where large pull requests are essential:

In dysfunctional orgs with enormous code review time per PR, with lots of nitpicking, and a CI/CD pipeline that is constantly broken and takes a LONG time to get things into a staging environment.

Maybe I’m just hopelessly cynical, but I suspect that this is representative of the average developer’s experience.
In large corps it could be yes. In smaller and more nimble companies/start ups, there's a lot of good pipelines.
and then you send it to your friend on the weekend for a blind approval? :P
It seems to me the size of the PR is a function of the complexity (i.e., the lack of simplicity in the design / architecture). In that case, large is a symptom. What's ultimately slowing things down and adding risk is the complexity.
Worst offender is when a dev decides to reformat code and you can't even differentiate at a glance what is real code change and what is reformatting.

Fortunately modern tooling formats code automatically so there's less bikeshedding.

There are some tools that can separate actual code changes from reformatting changes. I am working on https://semanticdiff.com, a VS Code Extension / GitHub App that can help you with this. There is also difftastic if you prefer a CLI based solution. It supports more languages but can detect fewer types of reformatting changes.
I like this idea of advancing beyond line-based diffs into something more meaningful. If someone renames a variable/type/module/whatever which changes X lines of code across Y files, I really don't care to see all of that, for what I'm concerned with that's a single diff.

(Assuming the code is in a language with a good type system and without reflection, I'm confident in automatic refactors and don't need to spend time on them in code review)

Just set to ignore whitespace in the review diff.
It's intuitive to look at size (as in lines changed) as a complexity metric. What is interesting about this article is that they measured what effect the number of files changed in the PR had on review and time to merge. The more files changed, the slower the review... If there were too many files changed, then the reviewer would spend less time looking at the changes before approving.
I agree, but reality is there are a lot of smart devs who are "big thinker" types and struggle with incremental development and lots small PRs towards a bigger goal.

I prefer incremental and used to roll my eyes at them, but it's just a different way of thinking and people/teams are diverse.

> [...] but reality is there are a lot of smart devs who are "big thinker" types and struggle with incremental development and lots small PRs towards a bigger goal.

It doesn't really matter. You can start with a big change initially as a 'big thinker'. You just have to break it down afterwards.

I often have a bit of feature creep when working on a change, and add all kinds of incidental fixes I find along the way.

But afterwards, I use git's tooling to 'peel off' all those extra changes and stick them into their own PRs. That makes my main PR slim again. If the main thrust is still to big after that diet, I think about breaking it down into incremental changes that make sense for reviewers.

That process of breaking down is all about telling an understandable story to reviewers (both reviewers right now, and the poor folks who have to look at my commits in a few months to figure out when and where we introduced a bug, or why certain decisions were made. Those poor folks often include my future self.)

> I prefer incremental and used to roll my eyes at them, but it's just a different way of thinking and people/teams are diverse.

Some people produce incremental PRs naturally, and that's great for them. But for the 'big thinkers' it's a learnable skill to break their PRs into manageable chunks after the fact. No need to change their natural style.

> It doesn't really matter. You can start with a big change initially as a 'big thinker'. You just have to break it down afterwards.

But that's extremely difficult I think and requires some out of the box extreme creative thinking on how to split it up after the fact. And I would also think it doesn't help at all.

I usually like to imagine how I build the new feature in head. Then I vomit out whole bunch of code, but I feel like splitting it up is extremely difficult and it will just lose context of why I did something as I did.

E.g. if I split it in a way where I only show shared components, it won't be understandable why I made them shared in certain way because you won't see the actual other logic that uses them, so reviewers are kind of left to trust that these will be used in later PRs.

And I think the way you make things reusable is likely the most important part because there's a fine balance there, you don't want to make everything too reusable that is not intended to be reusable, but at the same time you want to make it as reusable as possible considering the future, and it will take a lot of thinking to understand future implications. A reviewer won't have understanding of the reasoning without going through the process themselves on how something might be reused.

There's a lot of disagreement as well on how DRY you exactly have to be. Some people follow the rule of write it 3 times then refactor to be reusable, some want to do it immediately, etc. And having shared components first no one knows what is going on.

And then if they were to criticise my shared logic, and they want me to change anything, which could even be a change they request because of misunderstanding I will have to change all the other logic down the other PRs as well.

> But that's extremely difficult I think and requires some out of the box extreme creative thinking on how to split it up after the fact.

Thanks for the compliments, but I don't think I'm such a genius, and I manage this regularly. Splitting gets easier with practice.

It's the same as writing any technical document or a scientific paper: you have some idea, and then you decide how to split up the presentation so that your readers can make sense of it.

> E.g. if I split it in a way where I only show shared components, it won't be understandable why I made them shared in certain way because you won't see the actual other logic that uses them, so reviewers are kind of left to trust that these will be used in later PRs.

You can explain that with PR descriptions. Or you can have multiple clean-up commits in a single PR.

> A reviewer won't have understanding of the reasoning without going through the process themselves on how something might be reused.

You are allowed to tell them. That's what the PR description and code comments are for.

(It's better to put as much as is reasonable into code comments, because they are easier to see for future folks trying to understand the code. And, of course, even better is to make the code self-explanatory, where feasible. It's a hierarchy of descriptions.)

> And then if they were to criticise my shared logic, and they want me to change anything, which could even be a change they request because of misunderstanding I will have to change all the other logic down the other PRs as well.

That's always a problem, if you make a big change and only give it to review afterwards. No matter how you present that to the reviewer.

Yes, it is easier to write code that incorporates review feedback, if you can get review feedback early.

I like this take. The onus is on these devs to learn how to use available tools effectively to split up their changes.

There's a clear parallel here to the idea of a "genius" who isn't able to communicate their ideas effectively. Can they really be considered that smart if no one understands them?

Yes, though it's is all about trade-offs. It works best, if your reviewers are also putting in some work.
People can be trained to make smaller PRs. I am guilty of big PRs, but each time I'm told to split up my PRs, it helps me the next time when I remember the pain of splitting it up late. So I'm getting better about making smaller PRs. I just need people to push back more and I'll learn even faster ;)
I've been trying git-town for PR stacking for the past couple of weeks and it's pretty nice.
The title seems to imply that "slow down development" is a bad thing. Given the immense churn and the result of what passes for software these days, I disagree.
there's a principle of small batches that would be worth following

https://queue.acm.org/detail.cfm?id=2945077

Never more than 200 lines.