Hacker News new | ask | show | jobs
by eyelidlessness 1120 days ago
After a brief scan I’d call the full change reviewable enough I could do it in a sitting. Most of it looks reviewable on my phone. But seeing >30 commits, I’d pause. Partly because I’ve become a lot more sensitive to the impact of commit history itself, partly because the quick scan of such a small change set doesn’t seem to line up with so many commits, but mostly because it implies much more context exists than the attention I’d pay if it came pre-squashed.

That kind of implication stops me in my tracks to learn more. I’ve spent literal days tracking down the meaning of single line code changes through multiple dozens of commits, sometimes across repo boundaries (ahem the original author’s suggestion of deprecating in favor of a fork comes to mind).

The size of this particular PR only becomes a factor when any one of those numerous commits can become that rabbit hole. How many humans’ days are going to be spent tracing history through this particular merge? For how many different reasons? I didn’t even look at the changes midway, but how many nuances are buried in there and lost unless this weird bundle of changes is preserved?

3 comments

I bet you're other thinking in this case. In general, the expectation on GitHub is that PR commit history doesn't matter, and owners should simply squash on acceptance. I think most contributors don't event consider that all their commits are visible or would be of interest and only think about the final product.

It's certainly simpler for the contributor to do the squashing, but when GitHub makes it so simple in practice it doesn't matter.

People wonder why Linus Torvalds dislikes github.

Imagine writing a highly performant and featureful relational database and successfully using it with large projects for a while without the database itself becoming particularly popular and then having a company come along and popilarise your database by telling lots of people about how good it is as a flat key value store.

Then people are really confused and annoyed as to why their key value store has this complicated and confusing relational database attached to it so they write lots of guides skimming over the details to help people get better at using the database to just store keys and values in one table.

If I was Linus I would be pretty pissed too.

That story strikes home for me. I worked at a company years ago where I designed and implemented a custom event queue like architecture using lua scripts in redis. The data would eventually end up in another database, but it would start its journey by being sent to redis. Once it was in redis the application servers considered it to be "committed".

Of course, sometimes bugs showed up in our system. One of the engineering leads would often say "oh lets just clear the redis cache". Every time I told him no, and once again slowly explained how we weren't using redis as a cache, and how deleting everything in redis would delete user data and be a terrible idea. He would have this far away look in his eye while nodding along and pretending he understood. I guess in his mind he was just thinking - why on earth would it be unsafe to clear the "redis cache"?

Months later I went on holidays. They ran into some bug. He reacted by wiping everything in redis. And, predictably, all hell broke loose. User data rollbacks happened, which caused cascading failures in the UI (which assumed that rollbacks would never happen). The team ended up reconstructing some lost data from some JSON which accidentally ended up in web request logs. Users had downtime as the whole app broke. It was a disaster.

When I got back to the office, I was hit with some strange combination of "why weren't you here" and "why didn't you tell us". Ugh. I still think about it sometimes. I have no idea how I could have handled that better. But I can tell you one thing for sure: I lost a lot of respect for that engineer.

As far as I understand, Linus doesn’t write code. He reviews code other people have written, and even other deputies review the code before it gets to him.

Additionally, many people are paid to work on his project by other companies. Linus doesn’t pay them, yet he’s their boss.

All of this is to say that Linus is very insulated from externalities. He can insist on his platonic ideal of a commit and SCM, if it makes his life easier. He’s like the editor at a publishing house, rejecting countless manuscripts yet never writing a word himself. And that’s fine.

However, most people do not use an SCM like Linus does. If you’re maintaining an open source project on GitHub you’re probably working for free, as are the people submitting PRs. The more difficult you make their lives, the fewer people will be willing to submit PRs and the more work you’ll have to do eventually.

It's really irrelevant what Linus does now or what your personal opinion of version control is. The point is, git does certain things well, and a large portion of its users don't seem to need or want it. Not it might be because those users don't understand what they're missing out or whatever, but git was written by Linus from endless experience dealing with both contributing and maintaining code. It's simply wrong to suggest that Linux has no idea what he's talking about from a contributor perspective. Moreover, there's still plenty of people, who actually understand git, who are happy with the workflow that it was designed around. Yes it's a complex tool, but it's not as nonsensical or confusing as the people insisting on not using it to its full extent seem to always claim it is.
I disagree pretty hard with this – for instance I've recently needed to dig into the code for the Gradio library, and when PRs are like https://github.com/gradio-app/gradio/pull/3300 (and the merge commit's message is what it is) it's hard to understand why some decisions have been made when doing `git annotate` later on.
I don't think you're in disagreement. Parent is saying that it is customary nowadays to have crap commit history. You're saying that crap commit history is problematic from a s/w engineering perspective. Both are true.
I’ve worked on a project with devs that write crap commit messages. It’s far easier to just commit to squash merging and requiring good PR titles and descriptions (which become the commit message following the squash).

Especially when you have multiple people working on a shared branch rebasing can be quite painful. The most common example of this is when sharing a branch with a QA.

I disagree, commits are mehh

Pull requests, patch notes, documentation and comments should be source of truth

Git is not project management tool, it just manages my letters history.

Curious use of `annotate` instead of `blame`. Former user of cvs, bk, hg, bzr, fossil, etc?
My favourite tool for this is the blame command in Magit (the Git client for Emacs). You can cycle between styles; one shows the commit message as a kind of header in-line with the code for instance. Another just shows a faint rule between lines that were changed in different commits. Then, one can press return to show the specific commit that the line was changed in. Well worth a try if you haven't already!
No-blame culture :grin:

(Also, JetBrains tools use "Annotate".)

> In general, the expectation on GitHub is that PR commit history doesn't matter, and owners should simply squash on acceptance.

This varies greatly from project to project, and is by no means a general expectation.

One sitting is a lot for a project you've forgotten about, lost context on, and are no longer excited by, especially assuming you have a lot of other stuff going on in your life. And then ya, everything else you said. And there are no tests.
This is it entirely - imagine suddenly being asked to clean and organize a house you haven’t lived in for ten years. It may take you a day just to get back up to speed on the code.

(Interestingly enough Knuth praised literate programming for TeX as the reason he could get back in and fix bugs after an almost ten year hiatus - where parts of the code he had not looked at in 40 years.)

Some professional cleaners do indeed do their job on properties they might never have seen before; maybe the programming equivalent is the external consultant who is expected to quickly identify bugs in unfamiliar codebases.
>> I’ve become a lot more sensitive to the impact of commit history itself

This was something @whitequark taught me. Having a clean commit history is very important when looking back, and we we look back more often than most people thunk. Self contained commits with good messages is important. I'm still not great at concise descriptions but trying.

+1 to this. I let my team pick one: very small PRs that get squashed when merging, or bigger PRs with well-separate commits and linear history that get merged normally.

Having a bunch of “fix” and merge commits in the main branch history is terrible.