Hacker News new | ask | show | jobs
by hinkley 610 days ago
I spend a lot of time cleaning up after people who insist there are no b problems in their code despite all evidence to the contrary.

That work is easier when they haven’t squashed their changes. Because I can see how they got there and if it was a mistake or a misunderstanding.

People who prefer squash are an automatic red flag because they usually don’t like asking Why, which is a very important skill on products that are shipping and making money.

2 comments

>That work is easier when they haven’t squashed their changes. Because I can see how they got there and if it was a mistake or a misunderstanding.

That sounds like a problem with the people you work with, not with squashing in general.

>People who prefer squash are an automatic red flag because they usually don’t like asking Why, which is a very important skill on products that are shipping and making money.

This is a wild generalization. Thoughtful people squash when they think they have a set of changes that go together. If someone is jamming together stuff that does not go together then that is indeed a problem, but not a problem with squash. Nobody really wants to see the 50 edits someone made to come up with one final change.

> That sounds like a problem with the people you work with

No, it says something about me, not them. When people can't figure out problems on their own they come to me for help. Have been since I was a sophomore in college, which was a long ass time ago. Possibly before you were born (8 month account). So I have a pretty good idea where 'rock bottom' is for every class of tool I've ever used, and how often people get close to them.

I also get called in to look at bugs that other people refuse to believe exist, and bug forensics is where you really, really see the difference between a good commit history and a shitty one. If you aren't using 'git annotate' weekly or daily then you are not qualified to comment on how merges should or shouldn't be done. "I don't use it" means you don't have an opinion. "... so you shouldn't use it" is telling your coworkers you don't give a shit.

> This is a wild generalization

I think you're confusing red flag with deal breaker.

> Thoughtful people squash when they think they have a set of changes that go together.

True but useless distinction. Define 'go together'. Everyone has a different definition of this and you will never reach consensus there. Most of the people I'm thinking of here think everything for a single story 'goes together'. This is how you get an initial commit for a new module with 600+ lines of code and eight bugs you have to solve the hard way because all of the bugs showed up in a single commit.

Squashing before a PR fails Knuth's aphorism about code being meant to be read by humans and only incidentally by machines.

If you don't like that it took you three tries to figure out an off by one error in your code, that's fine. But you don't have to destroy all other evidence of your other processes in order to cover up your brainfart.

I tend to think a major contributory factor to the indifference (at best) about commit hygiene is that people vastly underestimate the power of "show commit history for this range of lines" in modern IDEs/GUIs for Git.

It's incredibly powerful for (just from decent commit messages) figuring out why some little detail in the code is the way it is.

I'm thankful every day that I get to mandate Gerrit (so rebased-patches-on-top-of-main) workflow with every individual commit going through CI.

ETA: Incidentally, I'm usually also someone who often gets called in to figure out obscure-yet-important bugs... and the commit log is instrumental to that.

>No, it says something about me, not them.

OK I totally see that now.

Speaking of red flags, your whole comment is a red flag to me, just like mentioning that common workflows are "red flags" lol.

>If you aren't using 'git annotate' weekly or daily then you are not qualified to comment on how merges should or shouldn't be done. "I don't use it" means you don't have an opinion. "... so you shouldn't use it" is telling your coworkers you don't give a shit.

More narcissistic garbage takes. There are many ways to work and if someone doesn't do it your favorite way then that doesn't mean they are reckless, incompetent, or whatever. If you told this to anyone I work with or have ever worked with in real life in the last 20 years, you'd get laughed at. I might know a lunatic who would argue with you in real life but even he might not be motivated enough to take the bait. He is a very junior-minded person as well, whose experience does not match his interests.

>Squashing before a PR fails Knuth's aphorism about code being meant to be read by humans and only incidentally by machines.

This is too reductive. You have to use common sense when squashing stuff. If you put stuff together that does not go together, then it gets harder to figure out what a changeset is supposed to do.

>If you don't like that it took you three tries to figure out an off by one error in your code, that's fine. But you don't have to destroy all other evidence of your other processes in order to cover up your brainfart.

There need be no evidence of "processes" in the end. I can see why you might want that if you're helping your coworkers figure something out. But once it's figured out then those changes should be reduced to modular changesets that each do a particular thing. Anything else will introduce pointless noise into the codebase. If you feel that some particular state of the code represents something significant, you can make a commit for that. But certainly 80% of the commits most people make are purely noise.

Nope. Jamming together related changes is bad also.

Now, of course, someone who makes a new commit every time they make a minor change, like three commits to fix three spelling errors in a comment, should be squashing that. That's a strawman.

Changes that serve a different logical purpose in a related change set should be separate.

Issues should be catched by spending time in automating tests that ensure the correct functional and non functional requirements are met not by surgically maintaining a graph of codebase snapshots.

History is preserved in the branch along the PRs if needed, and it rarely is.

I'm not saying that rebasing is useless (I default to it), I'm debating if the effort is worth it in engineering terms, which I generally don't see because the benefits seem to be small compared to the cost.

This argument seems weird to me because even without rerere I found myself doing a lot more work managing merge conflicts with git merge instead of git rebase

As a git merge fan, are there any tips or tricks you suggest beyond the stock git experience when doing git merge to minimize the amount of merge conflicts you get?

I found it was especially bad when doing a git merge on a refactor, but I admit it could just be that I abandoned git merge earlier in my career before switching to rebase and never properly learned it

My most common use cases are feature or bug branches with a lifespan between less than one day and up to one month (although I absolutely have some features on pause for even over a year, in which case interactive git rebase and partially squashing WIP commits is my current method of updating)

All this is for repos from literally just me, to a few changes a month between 3 devs, to 5-2 devs doing multiple commits per day, to some open source projects with commits landing every few minutes from multiple devs if it's like a release day

My current biggest issue with rebase is verified commits with GitHub and a bit of guilt for rewriting committed feedback from other authors on my PRs

The only time I really use git merge is when I want to see how my work interplays with more than one feature branch at once, or if the feature branch I want to integrate hasn't rebases themselves in a bit and conflicts occur

Most of the time I've encountered two engineers pointing fingers about who is responsible for a bug, it turns out that someone's bad merge transferred the git annotation from one engineer to another.

The first time this happened (that I caught) I had two engineers who were sniping at each other. One was older "Max" and not great at data structure algorithms. The other "Stan" was a decent coder but had a bad attitude and was awful with git. Somehow he thought he could raise his status by getting Max kicked off the team.

I come back from lunch one day and Stan is bitching about a bug in Max's new code that's causing issues. To keep these two from fighting I've been reviewing all of Max's PRs and the line of code Stan is complaining about I know for a fact I checked, and was relieved to see Max got it right the first time. But sure enough, the repo says Max fucked it up.

Twenty minutes of git archaeology later and sure enough, Stan messed up a merge and resolved the conflict wrong, introducing the phantom bug. So I showed him the step by step of my diagnosis and then we had another little talk about using rebase.

That sounds like Stan was rebasing someone else's work though?

And wait was Stan rebasing main?

I mean yeah, in that case it's definitely, definitely wrong to use rebase. In general though I always rebase my feature branch and only then merge it into main. Because it's rebased onto the head of main and thus a linear series of commits, there's no merge conflicts to fix.

> I'm not saying that rebasing is useless (I default to it), I'm debating if the effort is worth it in engineering terms, which I generally don't see because the benefits seem to be small compared to the cost.

For what's it worth, I agree for most projects I've been on. I've rarely e.g. used deep Git history forensics to figure out a regression, or to figure out why some code is the way it is. Usually I'm just tracking down the fairly recent squashed commit of a pull request that introduced the problem and it's obvious enough where to look to fix it.

I like the idea of clean, super fine-grained commits with good summaries but I never see people mention that this takes extra time to do, because putting a pull request together is usually a messy iterative process, and not a predictable sequence of clean independent commits.

Real work is more like "Add sketch of code ... Iterate some more ... Fix bug ... Iterate some more ... Upgrade library ... Really fix the bug ... Clean up ... Merge from main and get working ... Refactor ... Add comments ... Fix PR requests". Rebasing as you go or going back at the end to break that into chunks that will each independently make sense and pass tests costs a lot of time? Maybe I'm missing something?

The time vs benefit trade-off is probably different with huge teams and huge projects, but for solo projects, small teams, and medium projects the trade-offs are different.

Feels similar to test suite discussions. People don't mention there's a cost vs benefit trade-off to how fine grained your tests should be for different scenarios as it depends on a lot of factors you need to balance.

The graph of snapshots is for forensics when someone breaks the tests and keeps obliviously trucking along