Hacker News new | ask | show | jobs
by mytailorisrich 1768 days ago
In fact, a commit should have a single purpose, otherwise it creates dependencies and, for instance, you cannot revert a feature without also reverting a bug fix and it makes cherry-picking a nightmare.
5 comments

With normal review cycle times, this would slow things very much.

Optimizing your development practices for being able to revert single commits seems very wrong to me. If you produce so many bugs that this is important, I'd focus on making fewer bugs.

At my current job I can't remember us reverting a commit in the three years I've been here.

We probably work in very different environments, and your views may be the pragmatic ones in your environment.

> With normal review cycle times, this would slow things very much.

I don't think it does. Ultimately you need to review everything so if you ensure that each commit is logically coherent it actually makes reviews easier.

Otherwise, in my experience, things can get quite messy and the history very difficult to follow.

It's also impossible to predict when or if you'll need to revert a commit. If you follow good practices and have discipline then you make your life much easier if things go wrong or you need to cherry-pick something, which is quite common with bug fixes if you have several branches (e.g. one branch per release).

For whatever reason, the events you say are quite common, pretty much never happens on my teams.

So we don't need to optimize for handling them.

I agree that reverting a single commit is less common than we might plan for, but don't smaller commits help review? That's how it is on my team. You can review smaller change batches instead of a whole pr or branch diff
> With normal review cycle times, this would slow things very much.

Compared to how much it would slow things down having to try to revert or correct a commit that both fixes a bug and introduces a feature? Especially if that change involved any sort of DB schema change that had to be reverted.

Anyone who has ever gone to production with a set of changes that involved an important bug fix and introduced a high-value feature, only to find that the whole thing had to be reverted because of a bug in the new feature, re-introducing the bug to customers, knows the importance of small, separable changes.

The unit of review doesn't have to be a single commit.
I agree, but I've also been told off for making many, small commits. I don't know why the complainers don't squash them together if that's the way they feel and leave the moaning out of it.
That's a little strange to me. As a reviewer, I find it substantially easier to read 30 commits that are 20 lines each than one commit that is 200 lines.

Bite sized ideas are really easy to digest. Maybe the real complaint is that the commits aren't sufficiently independent. I know some projects mandate that the full test suite be able to pass on every commit, which is really just a way of enforcing that every commit is "complete".

My view, and I think a best practice, is that for bugs the fix should be in a single, dedicated commit, even is that's just one line.

For features you may split into multiple commits if things are too big but then each commit should be self-contained and complete by splitting the feature in a number of smaller improvements.

So really, commits are what they are based on the work at hand as long as they are kept to manageable sizes. I've seen code reviews for 1000+ lines all over the place. The author may know what they are doing but the reviewers are usually lost...

More commits != Bad, but there are a few bad ways to do it. Committing incomplete changes just to save your current changes, carelessly trying stuff and then piling on fixes to previous mistakes instead of fixing them, etc. But yeah, some people are just threatened by certain things.
> I agree, but I've also been told off for making many, small commits.

We're not paying by the commit, here. As long as the commits make some logical sense, you're doing the right thing and those folks don't know how to use version control effectively.

If it's their project you sort of have to follow their rules. Otherwise I'd say just ignore them!

I'd agree when it's simple to do. Sometimes "fix a bug" leads to rewriting some code and the bug fix comes naturally - or more easily - with that.

For a contrived example, say a bug exists because a sort algorithm sometimes reverses the order of identical values. Fixing the resulting bug locally might involve handling duplicates with some extra logic, but changing the sort might result in fixing the bug as a side effect. I might argue for both because correctness should not be an accident, but we might change one fix to an assert rather than handling a case that doesn't occur any more.

A sort is called "stable" if it doesn't re-order things that compare equal. Unstable sorts are sometimes faster. If you care your code should request a stable sort. If your library only offers one sort and doesn't specify whether it's stable, I agree with your "defence in depth" strategy but I believe the right long term fix is to have the library clear up whether or not the provided sort is stable.
This may be a special case because the question comes up so often, but in general, I don’t think libraries should document all the qualities they don’t have, because that’s inexhaustible. In my opinion, if a sort function isn’t documented to do a stable sort, it must be assumed to be not stable, regardless of current implementation details. It’s often not such a big deal either: in many cases, there’s an artificial key (an id) that you can use as a tie-breaker in the comparison function.

One thing I hate is when people look at the library source code to figure this kind of thing out, since any implementation detail can change with an update. Assume the most hostile implementation allowed by the docs and you’re usually fine.

> One thing I hate is when people look at the library source code to figure this kind of thing out

In case you didn't already know: Hyrum's Law. https://www.hyrumslaw.com/ Even if the source code isn't provided and nothing is documented your users will rely on every observable nuance of your actual implementation anyway.

At Google's scale (internally, for their internal software where they could in principle fire people for doing this) this means if you change how the allocator works so that now putting ten foozles in the diddly triggers a fresh allocation instead of twenty, you blow up real production systems because although this behaviour was never documented somebody had measured, concluded ten doesn't allocate and designed their whole system around this belief and now South East Asia doesn't have working Google search with the new allocator behaviour...

In protocol design Hyrum's Law led to TLS 1.3 having to be spelled TLS 1.2 Resumption. If your client writes "Hi I want to speak TLS 1.3" the middleboxes freak out so nothing works. So instead every single TLS 1.3 connection you make is like, "Don't mind me, just a TLS 1.2 session resumption... also I know FlyCasualThisIsTLS1.3 and here is an ECDH key share for no reason" and the server goes "Yes, resuming your session, I too know FlyCasualThisIsTLS1.3 and here's another ECDH key share I'm just blurting out for no reason. Now, since we're just resuming a TLS 1.2 session everything else is encrypted" and idiot middleboxes go "Huh, TLS resumption, I never did really understand those, nothing to see here, carry on" and it works.

Thanks for reminding me of the term "stable sort". It was a contrived example. I suppose I could have abstracted even more:

Algorithm B sometimes fails on the output of Algorithm A. We can fix the issue by making B deal with that case, or we can change A so it never produces that output. Sometimes changing the algorithm in A just makes the problem go away, and maybe that was a good idea anyway. This seems too abstract, so I picked a slightly more specific (sorting) thing for A.

Mixing multiple purposes into a single commit not only makes the initial review more difficult, it makes later review very taxing when, for example, trying to sort out the “why” of a particular change when troubleshooting. If you have to look through 200 lines of non-functional cosmetic changes to suss out the three lines of functional change, you've had to reinvest basically the whole code review process time again just to get oriented.
I think those get mixed when fixing the bug is a side effect of a bigger rework, but still needs some specific tweaking for edge cases.

Cleanly split the rework and the edge cases handling can become hard enough to not warrant separate commits.

yeah, but my single purpose is "make code better". If in fixing bug I can refactor and/or fix/expand/add comments and/or document things to make similar future bug impossible or less likely I'm doing that.

Also the "single purpose" of a commit often includes: fixing or adding issue, documenting what / why of the fix, and mentoring other devs via example or co-review.