Hacker News new | ask | show | jobs
by zuprau 1116 days ago
I’d hate to work there. I’d rather review small chunks and merge often than review 5k lines that could be built on a shaky foundation/idea
1 comments

Best team I've ever worked with. 5k was on the larger side, sure, but technically difficult tasks often just can't be split into a series of smaller pull requests.
I am always amazed that many people with significant experience are so resistant to this idea. For what it's worth, my experience matches yours entirely: many significant changes can't be meaningfully committed in small chunks, they only make sense as an all in one.

And even more so, I've often seen the opposite problem: people committing small chunks of a big feature that individually look good, but end up being a huge mess when the whole feature is available. I hate seeing PRs that add a field or method here and there (backwards compatible!) without actually using them for now, only to later find out that they've dispersed state for what should have been one operation over 5 different objects or something.

What changes can't be committed in <5k LOC? That's a shit ton of code. If you can't break that down into smaller shippable chunks there's probably something wrong, or you're building something extraordinarily complex.

It's definitely overall quicker to ship like this, but there are tradeoffs. You are effectively working independently from the rest of your team, there is no context sharing and everything is delivered at once after a longer period of time.

Here's a personal example from work:

We're performing atomistic simulations. The first edition of the code stored each atom on the heap and had a vector full of pointers to the individual atoms. Obviously this would obliterate the cache, so I crafted a PR to simply store all the atoms in a single vector. On its own, that was a one line change, but it was also a very fundamental change to the type system. Everything as simple as

    Atom* linker = atoms[index];
    linker->x += 1.57;
Suddenly had to be

    Atom& linker = atoms[index];
    linker.x += 1.57;
If I didn't make those corresponding changes, the code wouldn't type check and the build would fail. I think the final PR came out to about 17 kLOC.
The OP is talking about feature work, as am I.

Obviously if you make a change to something like your type system it's going to generate a very large diff, but you also aren't going to review the full diff.

You're just going to make the change with find+replace or some other automation then write in the PR description "I made this change to the type system". No one is actually reviewing 17k LOC.

Actually the example is pretty good for the kind of problem I'm talking about.

Let's imagine that instead of optimizing the pointers to in-place structs, we were taking the optimized program and adding support for dynamically allocated atoms because of some new feature for dynamically adding/removing atoms.

We could of course split the value->pointer 17k line change into a single PR. But, that PR is only doing a pessimization of the code. On its own, it makes no sense and should be rejected. It only makes sense if I know it will be followed by the other feature, and even then, I would have to see the specific changes being made to know if this pessimization is worth it.

And if it got committed to the main branch, in preparation for other PRs that depend on it, the main branch is no longer in a releasable state, since it would be crazy to release with a performance penalty and no feature gain.

So, the right way to push this is as a single PR with a 17k-changed-LoC commit + the commits that actually use it. Of course, people would only manually review the other changes, but that's easy to do even if it's all in a single PR. And anyone looking back at history would clearly see that the pessimization was only a part of the dynamically-allocated atom feature, not a crazy change that someone did.

This isn't what the op is talking about. They're talking about a net new feature that would span 5k lines. Your change is trivial compared to it, and frankly would earn approvals immediately without much thought (assuming the changes were already planned and talked about)
A new feature can easily involve the kind of modifications that they are mentioning. It's pretty rare for a new feature to exclusively involve new code, in my experience. And when it needs modifications in a deep part of the stack, the new feature will easily spiral into small modifications to thousands of lines of code.
I've experienced this changing the API for a primary memory allocator in a frequently updated code-base. Each location updated was perilous and it needed to be changed in bulk to avoid an endless war of attrition.
We have a small feature that was put together in a rush. It "works", but only works correctly for some simple cases, otherwise there are bugs everywhere. There were very few tests. We must redo the while thing, update the UX and add tests. Tell me how we can achieve that without replacing all existing code and add new tests that cover every use case in the same change.
Why do you have to do it all at once? You can't improve the codebase piece by piece?

Shipping all the changes in a monster PR is usually not a good option. One big reason why is that you do not create any value until the whole thing ships. If you ship piece by piece you can create small amounts of value every time you ship.

Also, if it's a "small feature", why is it 5k+ LOC?

Upgrading versions of a framework with breaking changes is one. I did a 12k PR a few weeks ago that touched almost every file of the app.

It was an all or nothing thing, as almost every third-party dependency we used had to be swapped by something else.

At least we have amazing test coverage, so it was easy to find bugs.

But there wasn't much I could say other than "trust me".

I think if a review is very large, the owner of the code should do a code walkthrough for the whole team.
>many significant changes can't be meaningfully committed in small chunks

They almost always can. The exceptions are stuff like autogenerated code or updating a dependency.

We as a team sometimes decided to PR into a PR branch, not as meticulous as a PR to develop but there'd still be eyes on the code entering the branch. Especially useful when there are dependencies and/or different disciplines contributing to the feature.