Hacker News new | ask | show | jobs
by jjmarr 610 days ago
I was on a team where we wrote software tests for computer hardware. Regressions were frequent. The underlying hardware wasn't very reliable because it was all very early-stage and hadn't been tested yet (as it was our job to write the tests in the first place).

The linear commit history created by rebasing made it trivial to bisect and determine what introduced the problem.

Huge difference to my productivity.

2 comments

git bisect will traverse both parents of a merge commit no problem. Did you try?

In your situation I'd prefer merges because: if commit X used to have parent A, and you move it over to parent B, it gets a new commit hash and a version of the code that has never been tested. If that commit is broken: was it broken when the author wrote it, or did it only break when you rebased? You threw away your only means of finding out when you rewrote history.

What you need is a "git rebase" that records a second parent for each commit pointing to the original commit that is being rebased.

People who prefer git rebase workflow will hate the complicated history they see in "git log", but otherwise it will be the same.

Alternatively, the right way to use "git merge" is to merge every successive commit of a branch one by one.

The problem with "git merge" is that it collapses multiple commits into one giant patch bomb.

If one of the commits caused a problem, you don't have that commit isolated on the relevant stream (the trunk) where you are actually debugging the problem.

You know that the merge introduced a problem, and it seems that it was a particular commit there. But you don't have that commit by itself in the stream where you are working.

It can easily be that a commit which worked fine on a branch only becomes a problem in its merged form on the trunk, due to some way a conflict was resolved or whatever other coincidence or situation. Then, all you know is that the giant merge bomb caused a problem, but when you switch to the branch, the problem does not reproduce and thus cannot be traced to a commit.

If that commit is individually brought into the trunk, the breakage associated with it will be correctly attributed to it.

In both cases, the source material the same: the original version of the commit doesn't exhibit the problem on its original branch.

It is pretty important to merge the individual changes one by one, so that you are changing fewer things in one commit.

People like rebase because it does that one by one thing. Git rebase breaks the relationship by not recording the extra parents, but since they have the reworked version of each change on the stream they care about, they don't care about that. Plus they like the tidy linear history.

I didn't have to use git bisect. I looked at commit history directly and guessed what caused the regression.

As we all test different parts of the microprocessor and the tagging system reflected those parts, I could rule stuff out by looking at git log --oneline. The commit messages were also required to be high quality and I could get a gut feeling about what stuff a commit would touch without looking at the code.

> if commit X used to have parent A, and you move it over to parent B, it gets a new commit hash and a version of the code that has never been tested. If that commit is broken: was it broken when the author wrote it, or did it only break when you rebase? You threw away your only means of finding out when you rewrote history.

This happened semi-frequently. We were using Gerrit and had every version of a rebased commit visible together. When code that fails automated testing got submitted, it immediately caused CI failures for everyone. It took an hour for someone unfamiliar with the code to look at the timestamp the failures began, find the commit that caused the failures, and revert it.

I don't see how this would be meaningfully different in a merge scenario, because the merge commit also wouldn't be tested.

> the merge commit also wouldn't be tested

Why wouldn't it? This is the "not rocket science" rule of software engineering: every commit must pass the tests. There's no special exception for merge commits.

https://graydon2.dreamwidth.org/1597.html

The CI tests could take hours because of compilation time + waiting for hardware. Trivial rebases without conflicts got exempt from additional testing, because by the time the test finished, someone else would've submitted to main. Merge commits likely wouldn't be tested in an alternative workflow either.

Not a case of the company being too cheap to spend the money, because there literally aren't enough engineering prototypes in the world to satisfy our CI needs for testing on them.

From their perspective what's the difference? It would be better if after rebasing all resulting commits were tested automatically, but even if they were not - the offending commit is still wrong "in context".
Rebase can result in a long chain of commits that don't compile, which makes it impossible (or at least harder) to use automated bisect, or even semi-manual approaches like running a test case manually on each bisect step.
Did you ever try bisection without the linear history to compare? Or was this just conjecture?
I have. It was a complete fucking shitshow. In a kernel tree, doing a git bisect with the messy merge history will take you on a wild goose chase, where you land in some branch developed by an entirely different team somewhere, working on totally different hardware from you, with a different kernel version, which you have no hope of building and booting.
Hmm, as I think about this, I'm unconvincing it's specifically a problem with merges specifically.

If you were bisecting a rebase workflow and hit a block of commits that were unbuildable and unbootable close to your breakage, I'm unsure how you would progress.

But the case is likely better in the merge workflow, since in all likelihood you could mark the entire tree as good, and it could stop searching all ancestors. Which is far more likely to be correct in a merge workflow.

I feel like it's most likely that bisecting the linux kernel was in-fact the shotshow.

Edit: upon further research, it appears that git bisect is a commonly used and useful tool in the kernel, and the correct response to landing in that branch would be `git bisect skip` which should be far more informative to the algorithm then a skip in a linear history https://nathanchance.dev/posts/working-with-git-bisect/