Hacker News new | ask | show | jobs
by rprospero 1118 days ago
Here's a personal example from work:

We're performing atomistic simulations. The first edition of the code stored each atom on the heap and had a vector full of pointers to the individual atoms. Obviously this would obliterate the cache, so I crafted a PR to simply store all the atoms in a single vector. On its own, that was a one line change, but it was also a very fundamental change to the type system. Everything as simple as

    Atom* linker = atoms[index];
    linker->x += 1.57;
Suddenly had to be

    Atom& linker = atoms[index];
    linker.x += 1.57;
If I didn't make those corresponding changes, the code wouldn't type check and the build would fail. I think the final PR came out to about 17 kLOC.
2 comments

The OP is talking about feature work, as am I.

Obviously if you make a change to something like your type system it's going to generate a very large diff, but you also aren't going to review the full diff.

You're just going to make the change with find+replace or some other automation then write in the PR description "I made this change to the type system". No one is actually reviewing 17k LOC.

Actually the example is pretty good for the kind of problem I'm talking about.

Let's imagine that instead of optimizing the pointers to in-place structs, we were taking the optimized program and adding support for dynamically allocated atoms because of some new feature for dynamically adding/removing atoms.

We could of course split the value->pointer 17k line change into a single PR. But, that PR is only doing a pessimization of the code. On its own, it makes no sense and should be rejected. It only makes sense if I know it will be followed by the other feature, and even then, I would have to see the specific changes being made to know if this pessimization is worth it.

And if it got committed to the main branch, in preparation for other PRs that depend on it, the main branch is no longer in a releasable state, since it would be crazy to release with a performance penalty and no feature gain.

So, the right way to push this is as a single PR with a 17k-changed-LoC commit + the commits that actually use it. Of course, people would only manually review the other changes, but that's easy to do even if it's all in a single PR. And anyone looking back at history would clearly see that the pessimization was only a part of the dynamically-allocated atom feature, not a crazy change that someone did.

You specifically mention "big feature" in your original comment so it's confusing that this is a good example of the kind of problem you're talking about.

This is a very different situation than large PRs containing feature code. I think most people would agree that one large PR is the correct approach for this kind of situation.

Usually new features require modifications of old code, at least in my experience. If I came across as claiming it's likely for a feature to require 5k new lines of code, then I clearly communicated badly. But a feature coming with 5k lines of modified code, while rare, still happens several times a year in a large project in my experience.
This isn't what the op is talking about. They're talking about a net new feature that would span 5k lines. Your change is trivial compared to it, and frankly would earn approvals immediately without much thought (assuming the changes were already planned and talked about)
A new feature can easily involve the kind of modifications that they are mentioning. It's pretty rare for a new feature to exclusively involve new code, in my experience. And when it needs modifications in a deep part of the stack, the new feature will easily spiral into small modifications to thousands of lines of code.
In that case I also see "large" PRs, but I point the reviewer to "file X and then 4999 copies of file Y." When doing large no-change refactors, I submit standalone PRs and merge them quickly because no one will review 1k identical changed lines (e.g. if I change the location of /lib/ imported 600 times in the project)
What if the refactoring makes the code harder to understand when looked at in isolation, but is necessary for the rest of the feature to work? Why submit it in a separate PR without the context of why it's necessary?