Hacker News new | ask | show | jobs
by radarsat1 2716 days ago
> WHY there is a staging area

I understand your second point, but I have a hard time understanding the difficulty with this part. Why is it hard for people to understand the idea of staging?

You put things in a box one at a time before closing the box. Does it require more explanation than that? What do people find difficult about it?

9 comments

People are very used to the web "save always" style: There is one document, and you're editing it. Most people will be familiar with the traditional desktop "save" model where you have to do something to make your changes permanent.

People often then learn that there is a local file and some remote file: they can cope with a save -> upload workflow. Lots of traditional VCS turn this into a save -> commit workflow.

Git adds two stages to this that people can't see the need for without understanding the internals: an extra step between save and commit, and an extra step after commit.

(The discussion reminds me of all those people who think that if they just start by talking about monads then people will find Haskell easy and natural...)

There are a whole bunch of layers now, though they're all useful.

1. Is my document saved?

2. Are the changes staged?

3. Are the changed committed?

4. Are the changes pushed to my fork on e.g. github?

5. Are the changes merged into the upstream repository on e.g. github?

The don't need to understand the internals for this: just knowing that every save you do will be stored forever as-is makes you double-think about what you put inside
So I have a solid mental of git, and I understand the theoretical need for the staging area.

However, I find the occasions for using the staging area in practice are few and far between, for the simple reason that I can't test and execute the code that's in the staging area without also having the code from the working directory also be there. It feels like after having partially staged some of my working directory, it would be a blind commit with no guarantee that things are working.

Very rare is the situation that I can break out a list of files over here that are for feature A and some over there for feature B, and never the two shall interact.

I think this is probably what most struggle with regarding the staging area, without being able to articulate it.

I use it quite a lot, especially with `git add -p` to stage only parts of a file for an atomic commit.
I second this. It wasn't until I adopted this practice that the staging area really made sense to me. I find it helpful not just for making atomic commits, but as a way of remembering what I was actually doing, so that I can write a good commit statement.
This has never made sense to me. I've seen others say that they commit only parts of a file. How does this scenario start? Are you working on solving one problem, but then notice some other unrelated issue and fix that too, before committing the first change?
Partly, yes. Or, I'll be working on a task overall, and have to touch multiple files in the process. Then when I'm ready to commit, I review all the modified files on disk, and look for ways to break those down into smaller discrete logical changes. I prefer to avoid "big bang" commits as much as possible, because smaller individual commits are easier to inspect, easier to back out if necessary, and provide a better "story" when inspecting a file's history sometime down the road.
But then, you either never run/tested those smaller individual commits, or you have to do extra work (stash changes, test, restore stash) to do that.

I do not see why a source control system should make it easier to make a commit that hasn’t ever existed on disk and thus cannot have been tested.

I think the better model would be to stash your changes and have an diff editor between the on-disk working copy and the stashed version that allows you to commit a set of changes as several smaller, more coherent commits.

That wouldn’t guarantee that each of those intermediate commits gets tested or even built, but it would guarantee that each smaller commit is in the on-disk copy at some time.

> But then, you either never run/tested those smaller individual commits

Not necessarily. One nice option that the git rebase command has is --exec (which can be specified multiple times). So you can run a rebase and have git execute a command (like running a test suite) for each commit in the branch. If any commit files, the rebase process will stop and let you amend the commit to fix the issue.

> or you have to do extra work (stash changes, test, restore stash) to do that.

I've found that it's easier to write and locally test a given feature and them incrementally stage parts of it and create commits before pushing the code up for review. To me, that's easier than just making a large commit and then trying to split it out into a better set of commits after the fact.

For example, I may write a new method and then call it several places in the code. So my first commit would be to add the new method along with its unit tests and my second commit would be to add calls to it in the code base and update the associated integration tests (if necessary).

One common scenario is that I'm working on one problem, and in the process of solving that issue do some refactoring of related code. In this case, I want to commit the refactoring (which does not change the program's behaviour) before committing the changes that do change the program's behaviour.
I typically then send that first refactoring commit to Github (on its own branch) so that it gets full CI test coverage. And then continue working on the fix/feature while it runs.
One use case is to exclude extra lines of the file you don't want to commit. For example, I might have some debug print statements in my file that I want to keep in my local copy of the file while testing, but I don't want to include in the commit I push up for review.
> Are you working on solving one problem, but then notice some other unrelated issue and fix that too, before committing the first change?

Almost. Most often it's:

- Working on solving problem A - Notice problem B - Start to solve problem B - Notice I'm getting distracted from A, and return to finish it. - Want to commit my fix for A, but don't want to lose or forget the partial work on B.

Two different approaches I might take in this situation, depending on whether B is related to A.

1. If they are related (eg, B depends on A), use `add --patch` to commit A, then finish and commit B. 2. If unrelated, use `git stash --patch` to stash B, then commit A, then switch to a different branch to finish B.

Honestly, I see the point of both stash and staging, but not both together. Too many tools for the same job. On my long list of projects to do is a git porcelain that combines some of these concepts (eg, stash and working directory which would be tied to a branch):

- Each branch would have a single stash. - When you check out a new branch, all uncommitted changes are automatically stashed. - If the branch you're switching to has anything stashed, that stash gets popped. - Any current workflow that involves stashing can be replicated by using a branch instead of a stash.

This way, branches can be thought of as "state of the working directory", which is more intuitive with the branching tree model, imo; commits are a snapshot of the repo at that point in time; and the staging area is just a way to choose what should be included in those commits.

Amending the last commit does basically the same thing and records each state in the reflog.
You never amend commits or rebase locally before pushing? I rebase before pushing almost every time.

Git’s workflow wouldn’t even be sane without the staging area. This is what allows you to fix mistakes and make your work presentable for remotes.

> Git’s workflow wouldn’t even be sane without the staging area. This is what allows you to fix mistakes and make your work presentable for remotes.

I did exactly the same diff/tidy/diff workflow when I used p4 and svn, neither of which make a distinction between "working directory" and "staging area".

Right, but p4 & svn have “checkout” which is similar to staging. Staging is part of what we get because we can edit files without having to checkout / open for edit.

P4 and svn don’t have a strict commit parentage, which is why you can push commits in those systems in any order. Git’s strict concept of parentage is what makes the staging area so important for keeping your workflow similar to p4 & svn Workflows. Without a staging area, you’d either have to always fix mistakes with new commits, which is bad, or rewrite already pushed history, which is worse.

> without having to checkout / open for edit.

The terminology is a bit different - unless configured with mandatory locking (essential for some workflows) you don't have to open for edit. You just edit stuff and it goes in the "default changelist", roughly equivalent to automatic staging.

> Without a staging area, you’d either have to always fix mistakes with new commits

Mistakes at what point? In the normal svn workflow you can review with svn diff, then when you're happy do svn commit; it's just that there's no local place you're committing to. In both cases there's a critical point, either "svn commit" or "git push".

> unless configured with mandatory locking ... you don’t have to open for edit.

I’d guess you’re learning toward talking about svn, which I don’t remember very well, and I am leaning towward talking about p4, which always does mandatory locking.

You’re right the terminology is different between these different systems, I’m just pointing out that the git staging area has what you can think of as some equivalences in the other systems. Or, you can think of it as tradeoffs. Either way, the git staging area is something that helps you pretend like you’re using svn or p4 in the sense that it helps support editing multiple changes at the same time before pushing them to a server.

> Mistakes at what point?

With git I’m referring to mistakes between commit and push. But there’s a philosophical difference here that I glossed over. With git it’s easier to commit early and often than it is with svn or p4. With svn & p4 it’s easier to lose your work because version control doesn’t know anything about it before you push. If I make micro-commits, which I want and I like, then I put more “mistakes” along the way into my local history, and I can use the staging area to clean everything up before I push. With svn & p4, you make those mistakes and do the cleanup without ever telling the version control, and you run a greater risk of losing that work while you do the cleanup.

Never, and can never remember what rebase actually means.

At work I’ll hit the squash option on gitlabs merge request which moots all local machinations.

judging by the atrocious management of remote history I've seen at workplaces, "making work presentable" is pretty far down the line of priorities
Amending commits and rebasing involve the staging area?
Usually. You can also amend and rebase remote commits, but that’s usually a big no-no.
Committing isn't a commitment. After making the first commit, you can use the `git stash` command to put the rest of your changes aside, and go through the normal test->amend loop until you're happy with that first commit. Then you just retrieve your other changes from the stash to make your second commit.

It's also possible to do this without the stash command, by making both commits right away, and testing them later. However, that would involve rebasing(?) your second commit on top of any changes you end up making to your first commit, so using the stash makes more sense to me personally.

Fwiw, stash can get you into trouble more easily than commit. It’s no more typing to commit or branch, so I recommend preferring those to stash when it makes sense, or when you’re playing with changes you don’t want to lose. Stash is handy for a bunch of things, so use it by all means, just remember that there’s often an equivalent way that is just as easy and much safer.

The git stash man page talks about this: https://git-scm.com/docs/git-stash

“If you mistakenly drop or clear stash entries, they cannot be recovered through the normal safety mechanisms.”

One of the best things about git is how big the safety net is, as long as you tell git about your changes. Almost any mistake can be fixed, so why use features that aren’t sitting over the safety net?

A scenario:

You're adding a feature to your proggie. That involves modifying the main bits to add the feature and, say, adding a couple of interfaces to internal library modules.

Split out the changes to the library modules into separate commits---it's safe because nothing uses them, they're logically separate from the feature changes (although they don't appear to have a justification without the feature), the log will be marginally cleaner, and git bisect will have more granularity.

Why is the staging area needed in such a case ? In more traditional systems, you'd just do, say, "svn commit library/" and then commit the rest. (and you could do just the same in git too without seeing the staging area)
Understanding the staging area first requires understanding the need for it: The need for atomic commits. The need to create commits that have specific changes in them and are not always a snapshot of the entire world below the git root exactly as is right now.
Yes, it requires more explanation than that. I've used git for years, and never really understood why staging is even a thing.

Your example is an implementation of the box-putting algorithm, but it doesn't need to be mirrored in the put-box CLI.

    put-close-box file1 file2
This command could encompass all the putting and closing. Since you only close boxes when you are done putting things in it, I don't see a need or purpose to split it up.

    put-box file1 file2
    close-box
A closed box (commit) is always going to contain stuff that was put in it, so why separate commands?
That's not convenient when you're putting things into the box piecemeal, especially with `git add -p`. A thing I do frequently is to run `git diff`, scan through it, and add files (or parts of files) one by one in a second terminal. Then I do a final review of the staging area (with `git diff --cached`) to make sure it only has the changes I want and commit. I'm the sole devops engineer at my company and my workflow is a bit more scattered than a typical developer's.

Anyway, `git commit file1 file2` by itself is most of the way to being the put-close-box function you want; it just doesn't work for adding/deleting files from the repo. Seems like they could make a lot of people happy by closing that gap and letting `git add` be an intermediate-level feature.

To me, that ought to be a concern of the "porcelain", although no one uses that word anymore. CLI is particularly bad at certain types of interaction. So to compensate, a mitigation is moved into the underlying model of git. That mitigation is staging. The inconvenience of "piecemeal adding" could have easily been addressed in the UI layer using a more suitable presentation, rather than forcing all clients to follow the stage/commit dichotomy.
For simple projects (like ppl experimenting with git) you will always want to save all changes. So why stage first ?
Not everyone stays a beginner forever, and it's nice to have a tool that doesn't play to the lowest common denominator. It's really not that hard to just do a "git commit -a" if you want to avoid staging.
> Not everyone stays a beginner forever

But the vast majority do, or at best become perpetual intermediates (https://blog.codinghorror.com/defending-perpetual-intermedia...).

99% of developers out there didn't need a power tool for source control (source control is already quite a power tool many devs can barely handle, even in SVN form...), yet here we are: Git is imposed everywhere, with its horrible UX.

Git's UX isn't that bad if you're only cloning projects to build them locally and keep them updated. The UX only gets really crufty as you use more and more of the features.
I think people find it difficult because for most beginners at git, they just want to put everything in the box. Having the option to put just some things in the box seems more complicated than needed. Obviously, as you get better with the tool, you realize the power of literally "staging" your changes into multiple commits, but as beginner, it's not even in your purview.
My hurdle was 15-20yrs of no staging area from previous VCSes so the extra step took some time to understand why it was needed.
Isn't the staging area closer to an intermediary box? That's where it can get confusing.
Staging puts things in the box, commit closes the box, puts it on the pile with the other boxes, and gives you a new empty staging box.
But why is it an extra step? It's basically just a "longterm" selection of what you want to commit.
Because you not always want to put everything in the box (and if you do, there's a shortcut to do it), and "git commit file1 folder/folder/ * .cpp folder/folder/ * .h ..." for a complex set would be annoying and require you to mentally keep track of it from the beginning.

Many beginners will start by always doing "git commit -a" and that's fine, as long as they know there's an alternative once they need it.

But why is the exceptional case the default?

Surely, most of the time when you go to commit, it's all the files you've changed?

Not for me! I often find myself refactoring tangential features while producing a new one. Sometimes that will even intersect in a single file. But that refactoring doesn't come with any changes relevant to the feature I am working on in my branch. So I save them for their own isolated commit(s). While this doesn't happen on every commit, it probably happens for me about every other push. The alternative is bundling in a bunch of changes that have very little to do with the feature that my branch is ostensibly about.

EDIT: Now that I think about it, I also have several repos where I have changes that I never intend to ever commit them, because they are development conveniences for me personally.

Not really. I think of my git use case at work pretty simple. I usually stash, pull down, fast-foward and then pop my stash on top. Occasionally I'll need to rebase too. Just to show I'm not a super advanced user or anything.

I'm a JS dev mainly working in React on a web app with a backend team using PHP. Often I'll be working on a branch with maybe 2 or 3 people and I often end up working on a few things at a time. Say I'm working on a feature, and I notice some bug I'll fix that and then get on with my feature. Once I go to commit I pretty much always do a 'git add . -p' and I very rarely want to add all the files I've worked on!

Even things like switching a config file to use a service like apiary where I don't want to commit my change to the config to use apiary.. Or change to my webpack config for testing, etc.

I've used Perforce, SVN and Git and the whole 'staging' area thing always felt very natural to me. Here are the files you've edited, which ones want to be commited? It gives me a second chance to go through and check everything before I've commited, and often that stops me leaving in any odd comments or debug code.

Almost never actually. I never commit all the changes in my repo (for big projects I often have some small changes in other places, I don't want to commit them)
My point was more why staging is a special feature that even has a name. You're basically just selecting what changes you want to commit.

What is the usecase where one needs to remember that selection for more than just a few minutes?

probably related changes grouped together