Hacker News new | ask | show | jobs
by btilly 779 days ago
One idiot with rebase destroys history with no trace. I worked with such an idiot in a parallel team. I can't say how many weeks of work randomly got destroyed by said idiot.

I hate rebase on shared code I don't care how clean jt looks. Don't mess with history.

5 comments

I really don't understand how you can lose weeks of work. The person that would have done the force push would have the original commit in their reflog. ORIG_HEAD would be set.

Everyone else that had a copy of the repo would have had a copy of the "lost" commits.

I really cannot imagine how many things would have to go wrong for weeks of work to be lost.

There is a hierarchy to these things:

- Person who destroys git history

- Person who hates destroying git history

- Person who knows how to recover "destroyed" history

- Person who knows how to truly destroy git history

> Person who knows how to truly destroy git history

The Gitsatz Haderach

> - Person who knows how to truly destroy git history

… tell me more!

A rebase can be undone, because the old commits keep hanging around in the repo. Rebase doesn't delete or rewrite anything, it just creates new commits and adjusts branch pointers, so the old stuff is still there just hard to get at because nothing points at it anymore.

You just need to find an old commit ID somewhere, normally the reflog.

The old stuff will go away on its own eventually due to git's self-maintenance procedures removing unreachable commits, or it can be done forcefully by adjusting the gc parameters to get rid of it.

I cannot imagine how one could _truly_ destroy git history. You could destroy it locally, sure no problem. You _might_ be able to destroy it on your remote, but if you're using something like Github/Gitlab/Bitbucket I'm sure they'll have a cache that isn't trivial to remove from. But even if you remove it locally and from remote, there's no way you're removing it from other peoples clones. And other people could have pushed to other remotes.

Stuff "leaks" so much in git, that it's really hard to lose work. The only way I could see someone losing work is if they never commit or if they never push. But even if you don't push and just rebase, you're not losing work. You would have to go out of your way to delete git history locally.

What went wrong was multiple teams on unsynchronized 2 week schedules, and a culture that said that we had to accept the force push when other teams released.

So we released at the end of our cycle. It gets used for, if a vague memory serves, end of months billing. Meanwhile someone on another team "merged" our code, and actually randomly dropped a big chunk of our work. A week later they release. Some (but not all) of our features disappear. We pull that and don't notice because we're on a new sprint. Wait until the end of the month, users try to do billing. "Hey, why did you take away those features you built for us a month ago?" "What, we never...?"

We had no clue what happened.

This got to repeat a couple of times before we figured out what must be happening. We made changes to the release process so we could track what was actually released each time, with its history. We tracked down who we thought was making the mistake, but didn't have enough evidence to prove it to his manager. That didn't stop the idiot from making the mistake, but it did streamline the process of recovering it. Meaning we had the version with our feature, we had current code, and "just" had to sort out conflicts rather than rewrite from scratch.

Now that you've heard the story, can you see how weeks of work could be lost before we figured it out? And can you understand how we could have lost history?

This was a decade ago. At my next job we had more competent people. But there we had a huge debates between rebase and merge people. There are arguments on both sides. My conclusion was that about 90% of the time, rebase makes things simpler and easier. But that remaining 10% of the time makes the 90% not worth it.

Just learn how to merge properly.

I'm pretty sure your code was still around at that point, by default git keeps stuff around for 90 days. Although to be fair I don't know if that's the case today nor if it was the case a decade ago.

What you're describing does sound awful, but I'm pretty sure that idiot could have found a way to mess up a merge. The entire workflow sounds completely fucked, I'm not convinced it's entirely fair to blame rebase in that case.

> Just learn how to merge properly.

I know how to merge and rebase properly. My favorite PR merge strategy is rebase + merge --no-ff. So your master branch is nice and linear, but you can still see where your PR merges came in. Let's you have a "all PRs get squashed" view of the world by just adding '--first-parent' to your git commands, but also lets you have the inner details for when you're git bisecting or spelunking trying to figure out why a certain line exists.

Most people hate what I describe though, similar to mixing spaces and tabs.

My code may have been around somewhere. I suspect I'd done gc, in which case it wasn't. But my git skills then were certainly not as good as they are now. (I'd only recently switched from svn at that point.)

I agree that the workflow was a mess in multiple ways. A lot of which were organizational decisions that I was in no position to influence.

Your favorite PR strategy is fine if you're doing it locally. However when it is done on master, you're going to have to get master again by force. Because changed history creates conflicts. Which means that you're going to have to hope that everyone only did it your way, and no idiot created conflicts in some other stupid way that you'll suffer for later.

I'd prefer to merge to head early. Merge to head often. Merge from head often. Don't have long-running shared branches. This does take some other forms of discipline though.

I've never worked anywhere on master directly. Always in feature branches that then get merged to master (ideally with my strategy). So basically master always moves forward and it's history is never rewritten.

Master is always locked down anyway by "something" - no idea what the technical term for Github/Gitlab/Bitbucket is. Stopping people from force pushing to master prevents the sort of stuff that happened to you. Even if you don't have any "idiots", you really don't want a poor intern accidentally slightly pissing off everyone.

> I'd prefer to merge to head early. Merge to head often. Merge from head often. Don't have long-running shared branches. This does take some other forms of discipline though.

I agree with everything there, except I rebase instead of merge. So when I merge my branch to master, it's a nice neat little package that sits on top of master. It doesn't have the history of 10 merges I did while I was developing because I don't see the value in those merges.

But hey, to each their own. When I was younger, I used to get into heated debates about why I was right, now I don't really care. I'm either in a branch of my own and can do whatever I want, or working with someone and then I'll just copy whatever they do to not confuse them.

Unless the strategy is really bad, I'd prefer to go along with what everyone else does. When multiple people push their preferred optimum, the resulting inconsistency is clearly worse than a single suboptimal, but consistent, approach.
One never rebases shared code. They rebase their own work branch. Messing with history of master/main/integration branches should be blocked.

Rebase is a necessary part of a workflow even if you like merge commits. You're severely missing out if interactive rebases are not part of your toolbox.

I wouldn't consider rebasing your own local commits on top of a more recent remote master to be messing with history in any meaningful way, and that's the most useful method of rebasing.
I can give an example scenario.

Assuming "H" is the hash of the current state of the repository content, consider this initial state of the repository (most recent first):

    H(3) Implement feature B
    H(2) Implement feature A
    H(1) Initial commit
Now you implement "shiny feature", so your history in your branch looks like this:

    H(5) Shiny feature, improvements.
    H(4) Shiny feature, initial implementation.
    H(3) Implement feature B
    H(2) Implement feature A
    H(1) Initial commit
You tested H(4) and H(5), and everything looks good.

Then you `git pull --rebase`, and your history looks like this:

    H(10) Shiny feature, improvements.
    H(9) Shiny feature, initial implementation.
    H(8) Pulled commit C
    H(7) Pulled commit B
    H(6) Pulled commit A
    H(3) Implement feature B
    H(2) Implement feature A
    H(1) Initial commit
You test H(10) because it's the current state of your repo, looks good, and merge (or create PR, whatever).

With the usual pull request flows, `H(9)` (i.e. anything between your new "base" and your most recent commit) usually stays untested, entirely ignored by the developers, and you would only ever find out if you ever need to bisect.

Not usually a problem, unless you have a rule of "every commit should be verified/tested" and the untested commits have a change that doesn't prevent a build but still causes issues (e.g. something that's only visual, or a new config file was added to a "conf.d" directory and its presence changed some behavior, stuff like that).

To avoid this you can squash H(9) and H(10) before pushing to a shared branch, this way only one tested commit will be added on top of existing commits.
Rebasing unpushed commits is ok. But I have yet to see a workflow that provides good enough guardrails to make it something you can do safely.
Protect your main branch?
One of the great advantages of git is being able to pull from other people's feature branches, not just master. So protecting just master isn't good enough.
Yeah so you have them go through the workflow that doesn’t ruin things, like pull requests?
I don't want to have to go back and forth with someone to pull their branch. I want to just be able to pull anything they've pushed.
--force-with-lease

And only on working branches. I do this every single day.

Not good enough, that can mean you rebase changes that someone else has based further work on (but hasn't pushed it yet, or has pushed it to a different branch).
Why are you having people base their work off your in progress work? Git is not the issue with what you are describing.
> Why are you having people base their work off your in progress work?

To collaborate more closely and reduce (or get ahead of) conflicts. The whole point of using git at all is to be able to base your work off other people's in-progress work; if you're not interested in doing that then Subversion works better.

Rebase cannot destroy "weeks of work". No git command can delete commits. Unless you have some insane garbage collection policy that is very far from any defaults. This is your fault for not understanding your tools.
Didn't you guys have filesystem backups of a shared git repository?

This is exactly what backups are for.