Hacker News new | ask | show | jobs
Git Workflow Basics (blog.codeminer42.com)
187 points by igor_marques 3599 days ago
12 comments

This workflow is scary. A rebase should not be part of any everyday workflow and must be reserved _only_ for exceptional situations.

Rebasing can cause the loss of history and developers should be as careful with it as system admins are with `sudo`. I can't recommend any workflow that includes it without treating is as a terrifying and scary thing. How easy is it to accidentally remove a line during interactive rebase and lose all work associated with it?

This is why my team and I moved to squash merging. Sure it has it's own drawbacks, but they're far less worrisome than rebasing. If you screw up a rebase, the history is re-written or force-pushed by accident. If you screw up a squash merge, you can still check out the intermediate commits if you know the hash.

We won a Ruby award for our work on Git Reflow. There are big improvements coming this week that can make it easy for teams to tweak the workflow to suit any special needs you might have. It works on github and bitbucket and automatically creates pull requests (and makes sure they're reviewed.) Gitlab support coming soon (maybe this month).

http://github.com/reenhanced/gitreflow

But some of us use git-rebase every day perfectly fine and never lose information. Yes, you must be careful. No, you should never use it on public commits.

I wouldn't recommend it to others as I don't trust them to read the man page and understand what git-rebase does. Those of us who use git-rebase also know how to recover the refs since before they are GCed, though I've never had to do that.

It's dangerous to pronounce certain powerful features of a system as off limits for day to day private use. That's a different pronouncement than a decision not to share "why everyone should use git-rebase often".

If you are planning on doing some tricky rebase with possible problems you can just create a branch before you rebase on the old head. delete the branch when you know the rebase is successful. Then you don't have to deal with the reflog at all.
And we use git-rebase commonly on public commits without concern. Everyone uses git differently.
I think this depends on your definition of public? For me public means "in stable branches". I sure hope you aren't rebasing master
That's a weird definition of 'public'; for most folks, I imagine 'public' to mean anything accessible without any authorization.
When people talk about public commits it means commits other people have access too, and more specific, were able to commit on top. Authorization has nothing to do with it. In practice it means anything you have pushed.

An exception can be made for topic branches, especially in a pull-request workflow. These branches could be rebased / amended to update the final result, even after they have been pushed already.

Yeah maybe public isn't a good term for it, I usually use the term stable. So any pushed commits are indeed public, but they are only stable when they reach a stable branch. This means that if I am working on a branch I may remove, add, amend, reorder commits even after I've pushed them. If you constrain rebasing only to unpushed(not public) commits then you instead end up discouraging pushing code to the remote.
> How easy is it to accidentally remove a line during interactive rebase and lose all work associated with it?

This is not only not easy, it's actually very difficult. If you drop something in an interactive rebase, you can reset your HEAD to the HEAD commit from before your rebase. It's a bit arcane, and has its own dangers, but it's also important to be clear that rebases are not destructive unless a git gc runs between your rebase and realizing you made a mistake. It's also the equivalent fix to checking out intermediate commits for a squash merge, as far as I know.

Don't get me wrong, I don't think rebase should be the first tool you reach for and I don't particularly like “rebase everything to master” workflows. But it's not as dangerous as you're making it sound, IMO.

correction: unless you wait 30 or 90 days (depending on what you do in the interim) and then run other git commands which trigger automatic gc (which typically only happens with large repositories) or manual gc (in which case you're just asking for problems)
> This is why my team and I moved to squash merging. Sure it has it's own drawbacks, but they're far less worrisome than rebasing. If you screw up a rebase, the history is re-written or force-pushed by accident. If you screw up a squash merge, you can still check out the intermediate commits if you know the hash

Until the original branch is deleted and the refs are garbage collected, anyways.

It seems strange to me to advocate for squash merging on a premise of not losing history. A squash merge is a rebase.

  > A squash merge is a rebase.
Well, it's not a rebase in the literal sense: the base commit isn't changing. It _is_ modifying history, though.
It probably is in the same sense that 'git rebase -i' often puts the new branch on a new base even if that wasn't your explicit intention. Usually merge squashes are reparented to the current head of the target branch.

But my point was just that a merge squash is just a specific incantation of the git-rebase tool. And it is one of the most history destroying incantations, rather than the least.

Ah, sorry, I'm probably mis-understanding a "squash merge". I thought you meant you have a graph like this:

    C -> D -> E
   /
  A -> B
then you "squash merge"

    S------
   /       \
  A -> B -> F 

where S is C + D + E.

Whereas a rebase + (now fast-forward) merge would be

  A -> B -> C' -> D' -> E'
and, if squashed during rebase -i or by some other means

  A -> B -> S
It seems you're saying a "squash merge" is the last one, I thought it was the second one.
That’s right. `git merge --squash` doesn’t even make a commit, it just updates the working tree and index to look like the post-merge contents — it’s then up to you to actually commit. So yeah, you don’t even end up with a merge commit.
I'm sure someone out there does it like that (the joy and pain of git being there's a million ways of doing things), but the name I think is derived from the idea that you're replacing a merge commit with a squash of the disjoint parent, rather than a merge of a squash.

Also it's entirely possible I'm wrong, I'm a fan of merge bubbles (sometimes rebased for clarity), and avoid large squashes in general, so maybe I just don't understand how people do it. I don't know why you'd bother to keep both a merge and squash commit around, though.

Rebase is absolutely a part of my everyday workflow, and the rest of my team's workflow as well. (Yesterday I had to show our CTO how to use it, because the rest of the team was getting annoyed by his merge commits.)

Our workflow is:

- locally, commit to local master or a local branch

- occasionally checkout local master (if necessary) and pull using the 'rebase after fetch' option.

- if we had local master commits, fixup any conflicts in our code

- if we were working on a local branch, checkout that branch and rebase it on the new master, fixing any conflicts that arise

- If our work is complete, possibly do a final rebase to reorder and squash local commits, and fast-forward merge to master if we were on a branch. Finally, push the local master to share our work.

Note that we never rebase anything that's been pushed. Also, if we're worried that a rebase is potentially complex and error prone, we create a new branch at the existing HEAD so that the old commits don't get lost in the reflog. Once the rebase is done, we can delete that branch so the old commits can be garbage collected.

Merge commits on pull are an annoying misfeature of the default workflow, but I thought you could fast-forward on pull to eliminate them?
> annoyed by his merge commits

Getting rid of merge commits is literally the only benefit of your workflow over the standard branching model.

Not true. We've also found that rebasing our own work onto the latest master makes it a lot easier to deal with conflicts. When you do a merge, the conflicts you see are a mixture of your own code and someone else's code, and it can be hard to tell which is correct because you're not familiar with the other person's changes. But when you rebases, all of the changes are code that you wrote, so it's easier to figure out how to fix the conflict.

This may sound counter-intuitive, because you're thinking it's the same conflict either way. Most of the time it is; if you and someone else changed the same bit of code, the conflict will be shown to you the same way whether you merge or rebase. Those are easy to fix. What's harder is when someone reorganizes some code without making significant changes to it. In a merge, you'll see changes all over the place, but in a rebase git can usually figure out the new line numbers, and may not indicate a conflict at all. The other area where my team has had difficulty with merges is in Visual Studio sln and csproj files. When you add new projects to a solution or new references to a project, git can present a very confusing diff during a merge conflict. But for rebase only your additions are highlighted, and most of the time you can solve the conflict with "use theirs before mine".

At least locally within your repo, it's actually very hard to lose work completely with git, via any combination of resets or rebases. The reflog (git reflog) stores all the movements of HEAD such that you can recover from almost any mistake as long as you realize you made it.

The fear of rebase seems to always come from its ability to "delete your work", but that fear is almost always unfounded and based on a lack of knowledge on git's internal structures. Of course, one of git's biggest and well recognized faults is that its UI makes no attempt to alleviate those fears.

> A rebase should not be part of any everyday workflow and must be reserved _only_ for exceptional situations

Git rebase is absolutely a vital part of the developer toolkit. If you're not using it, you're missing out on a big timesaver and git feature.

I think you must be talking about rebasing public commits/history. In the article he is specifically talking about private history and has a nice big warning against rebasing public history. Linus has explained the distinction pretty well before: http://www.mail-archive.com/dri-devel@lists.sourceforge.net/... .

This is FUD. Many people use git-rebase flawlessly, without any of the issues you've encountered.

Wouldn't your tests catch when lines get elided by manual merge conflict resolution? Have you see mjd's Git Habits[0], which lays out a foolproof way of rebasing without losing lines?

[0] http://blog.plover.com/prog/git-habits.html

I completely disagree with this. You can easily undo a rebase locally if you mess up by resetting to hashes you pull out of the reflog.

When I'm dealing with my own work branches -- I absolutely rebase my commits before submitting a pull request to my team. I'll either squash irrelevant commits or I'll reword poorly written git commit messages. I also almost always default to `get pull --rebase` as well, as I hate the noise of merge commits littering up my commit log.

> This workflow is scary. A rebase should not be part of any everyday workflow and must be reserved _only_ for exceptional situations.

This is what Mercurial Evolve tries to solve. There's nothing wrong with rewriting draft commits. The only potential problem is rewriting public commits. Mercurial uses phases to distinguish drafts from published commits and you may optionally designate certain repositories as non-publishing, so that they can be used for collaboratively editing draft commits.

A similar de facto convention on git is to only rewrite certain branches (e.g. feature branches) but never rewrite others (e.g. master). Commits that are local-only can be rewritten at will. Mercurial just codifies this convention via phases.

Typically, I see folks create a working branch, make incremental commits, such as a commit per day so as not to lose work, then rebase them all into a single commit before pushing back to origin. We haven't had any problems with this approach so far. YMMV
This is misleading advice, principally because “squash merging” is the very definition of losing history — it literally discards the commit history of your branch, which will then be lost to GC.
My apologies if this comes across as overly harsh! This article was really quite well written, and it's only after having been burned through rebases that I've gotten so headstrong against it.

We need more articles like this! Thank you for your work!

What is the harm in rewriting history in your private "topic branch"? This is not uncommon for the use case the author was describing. It certainly creates a cleaner logical history.
> If you screw up a rebase, the history is re-written…

https://git-scm.com/docs/git-reflog

> …or force-pushed by accident

Well, you could nuke git folder "by accident" as well :) Jokes aside, don't force-push to "other's" branches).

If you really feel like you do have to force-push to someone else's branch, please use

    git push --force-with-lease origin/other_branch
so you at least won't trash any commits your collaborator pushed after you last fetched their branch. git will see that your origin/other_branch doesn't match origin's other_branch, and will bail out so you can pull the changes and decide how to incorporate them.
> If you screw up a squash merge, you can still check out the intermediate commits if you know the hash.

if you're going to store commit hashes externally instead of using the reflog then it makes no difference whether you use rebase, merge, or even cherry-pick or reset.

also, sudo itself poses no risk; it's much more important to evaluate what you're running, instead of a blanket restriction on what is just another tool. if I run "cd /; rm -rf *" on my desktop, it doesn't really matter whether I'm running as root or as my user, I'm going to have a bad day. "curl | sh" is equally as dangerous as "curl | sudo sh".

The rebase would ideally be the last thing you do on your branch, and very carefully.

And I also mention in the post that we should always use what's best for our teams, so if the squash merge works best for you, go for it! :)

I'm so happy this uses squash merges, using rebase for this purpose is exceptionally overkill when you just want a clean commit history (due to the complexity of using the rebase tool), if you're writing a feature that involves a large amount of changes throughout the codebase it can be a handy tool to break up your work but for the general case of smaller PR's a squash merge is the way to go.
I don't share this concern, I've been rebasing frequently(several times a day) for years now and never caused any unrecoverable problems. I would recommend that the last thing you do before pushing a branch you've rebased is

git diff origin/master branch

This way you can see if the diff looks like what you expect it to be. If it doesn't and you believe you messed something up while rebasing just

git reset --hard origin/branch

and redo the rebase

we've adopted it daily to solve the issue with bad merge history. Seems that git will do a merge on the server and is confused on who did what. Care to elaborate on how you solved this problem w/out rebase?
Why squash merge instead of simple merge?
At the same time, with this workflow (just like git flow) you're not doing continuous integration. Which is quite bad, imho.

https://www.franzoni.eu/git-flow-is-superflous-and-complex/

If standing up a deployment or test environment for a branch that's not yet merged is difficult or impossible, that's what prevents you from doing CI, not whatever branching strategy you use.

The model described in this post is basically the model used with svn, and it brings with it its own vast set of problems that active uses of topic branches are intended to resolve. We've been down this road before and it's not roses and sunshine.

Nothing stops you to use continuous integration with this workflow. For instance, Travis-CI and Codeship support branching (they can tell you if the new build will be fine or not).

I didn't mention that in the post because it's intended for beginners and adding that info there could maybe be a little too much :/

Sure, most CI systems support branching.

But what if you need to do ship a full pipeline of a branch for multi-repository project? I clarify the issue better in my post.

Your suggestion feels a little misguided. You end your post by saying that you contest the a priori idea of branching, but seem to forget everything that is right about branching in the first place:

* Want to see the history for a specific feature? Impossible in your proposal, native in a feature/topic branching model.

* Want to do a code review on a specific feature? Again, impossible in your proposal, trivial in a feature/topic branching model.

* Want multiple developers to work under the same codebase with minimal conflict resolution and clear separation of tasks? Very hard under your proposal, easy in a feature/topic branching model.

You also say that using topic branching means not doing CI. I've used the idea of proper feature branching for years, and have never _not_ had a CI process. CI tools are most definitely ready for (and are quite welcoming of) workflows like git-flow. I'd be happy to speak at more length about how we implement it if you'd like, but I assure you it is all but complicated.

Hello Fred, thanks for your interest.

about the history for a feature: most branching based workflows prefer squashing commits when merging, so most probably the history is lost whatsoever.

you're right I didn't explain how I do code reviews - and, effectively, I usually prefer pair programming when pushing, and after-the-commit code reviews - but before the feature is toggled on by default.

There's a reason for this, I usually say that a review should review the status after a merge, not just a change; many a times I've seen reviews for a PR that miss the whole point, along the lines of "the change is good, even though the resulting merged code is complete mess"; on the contrary, if you review a certain commit before toggling a feature on, you're basically declaring that the code, at the point, is basically good. Yes, it may be hard for a large codebase, and often reviews are done by looking at what changed, and not at everything.

But I've seen many, many, many stupid errors done or overlooked because people just looked at the change and not at the whole code after such change.

Wow, a blog post called "Git Workflow Basics" that really does just have simple and useful basics and doesn't try to terrify you out of or into using specific git features that the author decided are infinitely sacred or infinitely profane.

I can't really explain the depth of how pleasant, and how surprising, this pleasant surprise was. Thanks!

Could anyone who follows the "rebase all the merges" workflow detail why they choose to work that way? It seems to me that Git's strength is being able to time travel in your repo (especially with something like git-bisect, one of the few tools I'd call downright magical)

But if you're rebasing your commits, haven't you lost that? The concerns about a "clean commit graph" seem more aesthetic than functional.

I don't know about rebasing everything, but a commit is a changeset, and a changeset should always have a distinct and succinct purpose. There is no value added in having a single feature spread out over 5 commits.

When writing a feature, I use git to save my progress, and once I'm done, I would like to present the feature as a clear and complete changeset. A commit is me saying, "These are the changes I'd like to make to the codebase", and I feel that argument is easier to make when I present one thought, rather than the dozens of thoughts I had on the way.

If I were perfect, then I would commit in a way that was one cohesive thought, but I'm not. My commits are often, "Did the thing", then "redid the thing, with better testability", and "Re-redid the thing, fixing some fundamental bug in how I did the thing at first", etc.

I really don't understand all the opposition ti this idea. This is exactly how git should be used, and it is how most successful large open source projects require you to work.

  > more aesthetic than functional
It is easier to understand a cleaner history than a messy one.

  > haven't you lost that?
You've lost the ability to look through one kind of history, but not other ones. bisect still works if you've rebased.
Only easier to understand if the commit messages suck. If your commit messages are descriptive, and each commit is an atomic unit of work (in other words: following best practices) rebasing has thrown away history that you can never get back.
Right, but often, that's not something that happens the first time around. The idea is that you rebase in order to get that kind of history 100% of the time.

You cannot get things perfect on the first try; this is part of the whole principle of code review. When my patch is perfect, except for that one little typo, what should be done? Is a history with two commits, one amazing, one saying "fix typo" with a one-character diff, or one commit that's perfect, an easier to understand history? What is actually lost by "throw[ing] away history that you can never get back"?

If it had been right in the first time, that history would have never even existed in the first place. So you end up with the exact same thing.

Did that typo fix introduce a bug? Maybe, maybe not, but many programmer hours have been wasted on incorrect single characters :)

If I were bisecting that repo, it's a lot easier and more useful to be able to point the finger at the one commit that actually changed the line, rather than having to parse the one monster squashed commit to find the one line that introduced the bug.

I tend to write documentation, so no, not a bug. But even then, there's lots of small code-review things that aren't always about bugs; project style, naming conventions, etc.

Furthermore, if this is a PR that's open, then the "bug" would have never even landed. So looking through history to "find what caused the bug" would have not even been a thing.

Personally, i prefer having a develop branch which all the development happens on, with developers creating feature branches from that, and then merge develop with master, when a new version is achieved.
Keep in mind that i only work on small teams with smallish projects, this workflow would probably not work for big projects with a bunch of developers working simultaneously.
Actually this works perfectly fine in large teams too, if you add another level of branches. In fact, that is how the Linux kernel development works
How do you guys approach the one branch per developer rule? For smaller changes, i do it all the time but it is very hard for bigger changes.

When we write bigger features we always need at least two dev. One writing the front end (HTML templates) and one the backend (whatever populates the templates, makes sql query). We both need immediate feedback. I design the models around the templates so i need the templates at least partly to work. He needs the models to properly do his work either (if he wrote the templates before I write the models, the development of the whole feature would slow down and it is not nice to only work with a lorem impsum all the time. We also get very detached from the actual feature that way.

How do you manage those situations? Just do one branch per feature and if that feature requires more work, then just let two people work on that feature?

Do what you say and both work in the same branch. If the branch diverges from master and you need to catch synchronize and then one of you does a rebase the other does git reset --hard origin/feature-branch. Alternatively just leave any cleanup till the end of the MR and designate one person to it
So, commenters do not seem particularly impressed with this workflow.

What is a good workflow around git?

There are plenty; there's no one right workflow, because different teams have different requirements.

At work, we use deploy branches for a few repos, and integration branches for others. Personally, I use rebase for my own projects.

Does anybody dislikes git ?
The English needs a massage here. The very first sentence is grammatically incorrect, which doesn't inspire confidence. It's a fine introductory post for learning Git, though, and as such I welcome it.
Typo - CTRL + F "Oficial"
thanks for pointing this. It's fixed!
I would re-write the first sentence to "You probably know how to use Git on a daily basis" or perhaps "You probably know how to use Git in your daily workflow."
Ugh, not this again. How many threads like this do we need before folks accept that git is not the solution to whatever problem we have?
Could you be more specific?
This article appears to be a cheap copy of Atlassian's Git workfflows and tutorial

https://www.atlassian.com/git/tutorials/comparing-workflows/