Hacker News new | ask | show | jobs
by bironran 1758 days ago
Ugh. So many concepts. So many things to remember. Why? Git is simple. SIMPLE. But only, IMO, if you go bottom-up and not top-down. There are only 6 critical concepts in Git and each is simple enough to be described in a single sentence.

1. Commits are immutable blobs that have one or more parents. Graphs, not trees. Anyone who uses trees for git commits misses the whole point and makes their (and their collaborators) lives complicated.

2. Tags are (mostly, best practice) immutable pointers to commits. Tag are "this is this thing FOREVER*."

3. Branches are named, mutable (by design) pointers to commits. Branches are "this is this thing FOR NOW. Later it'll be something else."

4. HEAD is special "branch" that moves around automatically.

5. Origin is the local snapshot of the remote. Origin is "what did it look like when I last looked."

6. (fundamental but not critical) Remote is the current remote state (queried by RPC).

7. Index (aka stage) is where you put changes you want to make into commits. (this is somewhat simplified). Index is "My current and immediate plan. Scrub as needed."

That's (mostly, for non advanced use cases) it. Everything else are commands to query or manipulate the various state. Every action (until it becomes instinctual knowledge) should follow the same recipe: 1. Figure out the current state (current commit graph, relevant branches). 2. Figure out the target state (desired commit graph, new branches positions). 3. Mutate using ANY command you want.

I think that's the issue really. Inexperienced dev / people who don't understand git look at commands as "this is how to do a thing". No. In Git there isn't "how to do the thing". It's exactly like writing code - so many ways to achieve the goal, just choose your own. It might be efficient and elegant, or bumbling and ugly, but it'll get there.

7 comments

Sorry, but what? Specifically: what about your writing justifies the assertion that git is simple? Git is a horrible convoluted set of commands to make a lot of different data structures [1] interact. And if you do it wrong you can get into very weird states. This is not simple in any meaningful sense of the word!

Heck, "a monoid in the category of endofunctors" is simpler.

[1] From the top of my head: The working tree, the index, the stash, the repo ADG, the local remote repo ADG, the remote repo ADG. Of course the branch labels are further state, and working with the commits directly is discouraged. Oh and files can be either tracked or not, and they can either be ignored or no. And one isn't a subset of the other. And that also interacts with the various state transitions.

Any system with a limited amount of concepts is simple. Emergent properties are easy to predict and explore. Physics of "perfect friction-less sphere in vacuum" is so easy to understand we teach in grade schools and toddlers grasp it by instinct.

I can't (yet) reason about monoids easily. But I can reason about Git, even if I can't figure out the single command to change the state the way I want it and have to resort to multiple commands. I guess it's easier for me to think in graphs.

But it's not a limited amount of concepts. One sphere in a vacuum is easy, three spheres is hard. Git has half a dozen subtle interacting data structures. But because people have built up a lot of experience working with them (and don't coach beginners and non-programmers) they shout "it's just an ADG, so simple!" and pretend like everything is fine...
Agreed. I have a couple of things to make my life slightly easier.

I could never understand what kind of twilight zone stashes go into or remember which stash is which when I had too many of them. So I never use stashes any more, I just make a branch instead.

I largely use git add -A, so I can pretend that the index does not exist.

2. Tags are named, mutable pointers to commits.

3. Branches are named, mutable pointers to commits, that you can "ride". While you "ride" a branch it keeps moving to always point to your latest commit.

4. HEAD is an implicit branch that you "ride" at all times.

afaik, the commit a tag points to is immutable unless you delete and recreate (and then nothing is immutable really).

re "ride" - that's exactly what I'm trying to avoid. It's an additional concept that isn't needed to understand Git. You need to understand the model. The "ride" is an emergent property of the model and commands that you eventually understand, but not a core part.

Without the concept of "riding", terms "tag" and "branch" would become exact synonyms. In that case you can just remove point 2 (consider it just a syntactic sugar) and thus simplify your list.

If a tag has any attempt at immutability at the data structure level, I know nothing of it.

It's all about how other clients treat branches and tags.

Once you've pushed a tag, no other clients will be willing to update their definition of that tag unless the users on those other devices force the issue.

So operationally, "tags are immutable once pushed" is a pretty reasonable way to look at things.

Remotely pushed branches of course also won't allow you to do anything but append without forcing on remote clients, so mutable and immutable isn't quite right, here.

So I guess I agree with your original contention, branches are mutable-and-you-can-ride where ride means "the remote client's porcelain will be happy with append mutations".

As far as I know, tags don't have an identity beyond their name. The CLI tries to steer you away from replacing tags (by naming the option --force rather than, e.g., --modify), but that doesn't make them immutable.
See git tag -f.

Sometimes it's reasonable to consider a tag immutable, though you should always checksum if you do.

Reminds me of:

    Bad programmers worry about the code. Good programmers worry about data structures and their relationships.

    -Linus Torvalds
Anyway, it's not for everyone to get to understand git this way, I guess. Some people will just react "just tell me how to do X in git!"
Sometimes you just want your tool to get out of your way and get the job done, instead of deeply understanding it. No shame in that. There are limited hours in the day and sometimes other things are more important.
I get what you mean, but git is basically a tool for manipulating the .git directory (and a working directory checkout).

I think understanding what .git dir contains and represents is as important as understanding the tool.

It's not like you need to understand how the tool works internally, or how it's built.

Analogy would be that you want to understand how to use a hammer, sure, but also the characteristics of material you manipulate with it. You don't need to understand how the hammer is built.

I work with some very good programmers who don't worry about code. Past a certain point it bites. You have to be thinking about scalability early so someone else doesn't have to refactor someone else's pride and joy because it's become full of overly concrete logic.
Like a lot of things Linus, it's very pretentious and aloof but right at the core of it. Code matters a lot and bad code can tank performance, stop evolution and introduce security issues. But with Git, this is a mostly truthful statement.
I remember reading a quote where he states, that when looking at new code he starts with data structures, to get an understanding of what's going on. Or something to that effect.

That would be more applicable here. But I couldn't immediately find it, so I pasted this one instead, which is somewhat close but not perfectly related to the OP.

Git is definitely not simple. It's simple if you have a solid understanding of data structures (trees, graphs), and know the concept of a pointer. Not everyone has that background. The concepts are learnable, but the commands have complex behavior that often require reference to use properly. The commands aren't simple by any measure because of all of the edge cases that exist.
it's almost as if git is a tool that was designed for software engineers and not accountants
We do have regular humans using git internally - godsend for remote work. They manage fairly well because they don't really need to anything complicated.

There are some funny neologisms like "check it out on the git" since they don't know what git actually is Vs how to use it but still.

Complexity should not be excused solely because the target audience is “smart” people.
This is the way I understand git. For me it's dead simple. I've tried to teach others over the years. Not a single person has got it so far.
So what's your conclusion from this failure? That everyone else is stupid? You're a bad teacher? Or it's not actually simple and the above explanation includes a ton of implicit understanding of the subtle interactions of the various moving parts?
For the majority of programmers I think it's lack of experience with data structures of any kind. C programmers have to understand pointers and most (I assume) would have implemented at the very least their own linked list at some point and maybe even a tree. But there are so many programmers who simply lack this experience so talk of pointers, links, graphs etc. is unfamiliar.

Then there are those whom I'm sure should have the necessary experience (because they are C programmers, for example), but still don't seem to get it. These people I think just don't care. They don't care about version control and therefore it's irrelevant what git is trying to represent. They just want to get their code merged.

That reminds me about the old joke about monads: At the moment you finally understand them, you lose the ability to explain them.

Seriously, I too find the basic concepts of git quite simple. But whenever I want to do anything slightly out of the ordinary, I find myself wasting a lot of time searching the docs. In fact, I find the naming of commands and their options almost the opposite of intuitive, given my understanding of the basic model.

I can't use the command line at all. It's horrendous and makes no sense. I use magit for everything if I can. If I can't then, like you, I have to spend ages searching the docs.
The way I tried to understand Git at first was like Subversion. Horrible. I almost deleted everything my team worked on for weeks.

Then I read "git inside out" [1] (not to be confused by "git from the bottom up" which I think is not as good), had a "aha!" moment, my view changed and everything became clear and easy. Transformation from graph to graph is something I do every day, so why not in Git?

[1] https://www.slideshare.net/MichaelNadel/git-inside-out-57904...

More directed acyclic graph (DAG), which I suppose is still a type of graph. That said I'm not sure if "graph" is conceptually better than "tree with cross connections". People that struggle with git may not have a good enough grasp on the differences between these structures that insisting on using the "proper" names is immediately helpful.