Hacker News new | ask | show | jobs
by tootie 4654 days ago
There is an inverse relationship the number articles title "x explained simply" and the actual simplicity of x. I honestly don't understand why the developer community refuses to admit the obvious that git is unholy clusterfuck of a product. It has a nice data structure inside it? Name another end-user product for which you are even vaguely aware of what data structures were used.
6 comments

Word. I have a theory that while git has completely taken over the software industry and most users can "get shit done" with it, very few of them actually understand what the fuck is going on. All those "simple explanations" are really nice theoretical pieces about the merits and design of git itself and most of the time tend to completely ignore (or dodge) the clusterfuck that every day git can be, especially for people starting up with it.

I'm talking about the botched merges, the hour-long rebases with 156 git rebase --skip, the "your branches have diverged" mysteries, the subtle differences between fetch and pull, the fact that all the GUI I've seen so far, far from being a tool, are actually complicating the task with their own little syntax.

If the underlying data structure is so beautiful, then how come there's no UI where I can simply drag branches around, have an actual "OOPS, MISTAKE, LET ME UNDO" button, reorder my commits with the mouse, something that holds my hand and actually cares about ALL git users, not the 1% connoisseur elite?

Don't get me wrong, I have zero doubt that git is in fact an absolutely great tool with a very intelligent design and that the users are to blame for not understanding it, I'm just thinking that if after more than 5 years of git-as-a-dominant-dvcs I keep seeing the same puzzled faces looking at a series of SHA-1 like it's the answer to the universe over and over again, there's something that's not quite right.

That being said, this is indeed one of the most comprehensive articles I've ever read about git.

> the subtle differences between fetch and pull

While I agree with some of your criticisms, as the resident 'Git guy', I've found no difficulty in explaining to people that pull is a convenience alias for 'fetch followed by merge'.

True, but to understand that you also need to understand what fetch and merge do respectively. Not so easy for the profane/beginner. You're correct that this one is a bit of dishonesty on my part though :)
Yep, that's true. I think I've nearly taught our testers the difference. :D
> Name another end-user product for which you are even vaguely aware of what data structures were used.

Unix.

You are operating primarily on a tree of files and streams of text. To operate on these you have a wide array of utilities that perform simple tasks (and a handful that perform complex tasks as well) that, when composed, allow you to perform any transformation you want. You can get a freshman CS student off the ground with Unix like systems in what, one lecture?

Because the (bleedingly simple) data model is the focus when learning Unix, you don't need to memorize every single little edgecase of the system. Knowledge of the data model alone is enough to tell you what sort of things can or cannot be done, and general knowledge of the sort of thing that a few utilities do is enough bootstrap yourself. If you want to list out some files in some particular way, you may not know immediately what exactly to type, but you probably do know that ls or find is a decent place to start looking.

Except there isn't a single version of UNIX, each flavour has its own deviations.
Technically there isn't a single version of Git either, though (thankfully) they can all operate on the same repositories (whereas in UNIX land, you'll find different file systems that the others do not support). They do however have some different capabilities. jGit for instance can push to S3, which is pretty neat.
So I get downvoted by stating the truth. It shows how many HN readers have done portable development across real UNIX systems.
Git and Linux were both invented by Linus Torvalds.
Linus just created a kernel that can be used with a Unix-like system. The difference here isn't academic, what I am talking about above was created before Linus was born. (mildly interestingly, he apparently missed Unix Epoch by only a few days)
After looking at Git I wonder how Linus could ever constraint himself to POSIX? How come linux system calls don't have ten optional parameters each? Some of them actually mandatory, some changing meaning of the whole call?

Why go with boring open, creat, read, write when you can have rerere and prune and annex and reflog?

Not sure what you're getting at here, git developers got to choose their own names for git commands because there was not an existing standard that they were trying to implement.

Do you really think that 'prune' is a worse name than, say, 'fcntl'?

(Also, git-annex is a separate project from git.)

I think part of the problem is that git, like other version control systems, does not enforce any particular way of working (workflow).

When authors try to explain how to use git, they often have a very particular workflow in mind and don't necessarily describe exactly what that workflow is. This can cause major problems when somebody tries to apply advice to their own workflow.

I like the graphical approach. It helps to conceptualize what is going on so that someone can apply it to their own situation.

That's not it. Following any reasonable workflow still requires a set of arcane commands and flags that only make the slightest bit of sense if you know how git is implemented. I was able to use SVN successfully without ever knowing a thing about the implementation. I've generally had the same experience with HG and even CVS and VSS back in the day. Git adds sophistication over a prodcut like SVN, but adds vastly more complexity.
It's that its porcelain is a clusterfuck. The staging area is a hack that greatly convolutes the UI, and it would have been better if it were left out and you just had a way to cherry-pick what you want to commit at commit time (if it wasn't everything).

There's no symmetry in commands. The opposite of "git commit" is "git reset --soft HEAD^", not "git uncommit". "git reset" is three commands in one. I could go on and on.

git was designed as a data model and then a series of slapdash commands that enabled manipulation of that data model. It wasn't designed from the end-user perspective backwards, and this really, really shows.

I love the staging area, and I miss it everytime I have to us SVN at work.

I actually use the GIT-SVN bridge to work with git locally, pushing my changes up to SVN when I've resolved a topic. I do this in part because of the staging area. I have accidentally included changes in SVN commits so many times that I try to avoid SVN altogether.

I agree though, so many of the commands are just plain painful, the reset command(s) is a perfect example.

You have it right that it was designed as a data model. Linus has said a few times that it wasn't originally intended to be the Source Control Management tool itself, but more like a kit for building an SCM. Sadly, it took off so quickly simply because Linus built it and was adopted as the end user solution.

I have really mixed feelings about the index, personally. It is nice to be able to incrementally build up a change, but I do think that it's a thing that complicates learning for new users.

I'm curious, as someone who actually loves it, what if there were just better tools for incrementally altering the last commit instead? I'm not sure that a more robust git commit --amend couldn't achieve the same goals as the index with less conceptual overhead.

You're basically railing against orthogonality. I don't think anybody disagrees that some flags could be made more consistent, but creating a separate git-uncommit when that functionality is fundamentally encompassed by the purpose of the third invocation of git-reset is a mistake.

If you make a specific case for 'git reset --hard/--soft HEAD^' then you would either be left without the more general (and more useful) 'git reset --hard/--soft <commit>', or you would have a situation where you have to use 'uncommit' to (for example) re-commit something, or you would be left with a git-uncommit command and still have git-reset for related operations. Why would any of that be better?

You could split the third invocation of git-reset off into its own command, but honestly I don't see the utility in doing that.

Disagree. A git-uncommit command tells you by its name what it's going to do, even if you know nothing about git or even if you don't know what a commit is, you can assume that git-uncommit will undo it.

git reset --hard tells you absolutely nothing obvious about what it's going to do unless you understand git and its specific incantations. You have to learn git commands, rather than intuit them.

git add -p

       -p, --patch
           Interactively choose hunks of patch between the index and the work
           tree and add them to the index. This gives the user a chance to
           review the difference before adding modified contents to the index.

           This effectively runs add --interactive, but bypasses the initial
           command menu and directly jumps to the patch subcommand. See
           “Interactive mode” for details.
I might not [EDIT: in fact, was not: see reply below] be appreciating the way that the staging area convolutes the UI for you, but you can pretend it doesn't exist by always just doing commit -a. And, if you want to "cherry-pick what you want to commit at commit time", well, that's what the staging area is for, isn't it?
Pretending it doesn't exist won't unconvolute the UI. I'm talking about things like "git checkout -- <file>", and "Git reset" which could just be one command if they didn't have to manage getting things in and out of the staging area.

I much prefer bzr's porcelain, it's much better designed and more sane.

I had to use svn in college for some projects and we frequently ran into terrible merge problems that we simply couldn't figure out because we didn't know what was actually going on behind the scenes.

git is more complex on the surface, but I really find it to be so much simpler when you end up in real nontrivial use cases, especially when something goes wrong. I would agree that the commands are sometimes too memorization-intensive, but I've never been in a situation with git, even as a beginner, that I couldn't figure out and resolve relatively easily, especially with all the online resources available. I can't say the same for svn. Maybe I was just an idiot when using svn, but if I was an idiot there, I don't see why I wouldn't be an idiot with git as well unless there was just something fundamentally more useable and flexible and understandable about git.

The underlying data structures do matter, even with mercurial. One workflow I immensely appreciate with Git is locally committing some series of messy changesets and commit messages and then afterwards, when I have finished the feature, tidying everything up by rewriting the history (git rebase -i) before I push my commits.

When I tried the same approach with mercurial I found that rewriting the history is not reasonably supported by mercurial. They use these append-only data structures that sound nice, because they assure stability (no already written data is ever in danger because files are only appended) and fit with the general concept of commit->pull->merge. But when you want to change the commit history you clash with the append-only concept. The Histedit extension which is meant to provide this feature has warnings all over the place that what I do is dangerous and might result in data loss with changeset backups written to locations in the working tree. It's horrible compared to Git.

Now don't misunderstand me, Git has a lot of problems UI wise (for example I still don't really grasp what is going on with detached heads). But I find the fundamental design choices more sound than with any other VCS that I have tried.

Your rebase-based workflow is easily achievable with Mercurial. And there's even --outgoing switch in histedit.

> Histedit extension which is meant to provide this feature has warnings all over the place

Ignore them — you do know, what are you doing, right?

> changeset backups written to locations in the working tree

That's not true. Backups are written to .hg/strip-backup directory, which isn't tracked.

> Ignore them — you do know, what are you doing, right?

No, I don't :). I heavily rely on my CVS to never loose any data and to be able to go back to a previous state whenever I need to. This works great with Git's reflog.

> That's not true. Backups are written to .hg/strip-backup directory, which isn't tracked.

I stand corrected then. Maybe I'm mixing that up with amend or revert? I last used mercurial 9 months ago, but believe to remember there was some command that left .orig files lying around and that if you applied a history rewrite command several times those backups were overwritten with the new backups and you weren't able to come back all the way.

> to be able to go back to a previous state whenever I need to

as I've said, old commits are backed up.

> This works great with Git's reflog.

except git reflog is cleaned on git gc

Not mentioning http://mercurial.selenic.com/wiki/ChangesetEvolution feature is being in development.

> there was some command that left .orig files lying around

Only way .orig files may pop up is failure to replay rebased commit(s). No way these are backups — their purpose is to make user able to fix things and continue.

I used SVN for years without understanding it and that caused me to get weird merge errors I never managed to figure out. Git on the other hand I understood after a few weeks of using it.

Since I failed to understand SVN after years of use I would say it is more complex.

That is where I am at. Being able to use something for years while never actually understanding it is not a desirable position to be in, and really does not equate to any meaningful sense of simplicity. It particularly is not something that developers should strive for in a system built for developers.
Conversely, SVNs data model remained opaque after years of daily use simply because it solves the problem at hand poorly and is not defined very well.
I'm curious: did you use branches and labels in SVN? I've come across many svn repositories that don't use the trunk/branches/tags layout, and as a result the developers keep completely separate repositories for slightly different versions of their projects instead of creating branches. I've even seen new repositories created for each release version of the project.

If you're using svn you need to understand the implementation to get why copies are cheap, so that you can understand how to use branching and tagging appropriately.

The problem is that with SVN, you can get in gradually. To start, the branchless approach still gets you a great service - obviously an incomplete one, but better than not having source control at all. And you can have leaders who are the only people who have to think about branching and merging and have people very gradually move into that role.

Git seems to force you to dive straight into the deep-end, since everything is a branch, even your own local working folder.

Branches are fundamentally little more than text files in .git/refs/heads that contain the sha of a commit object. Don't let the idea of a branch frighten you, there isn't much complexity behind the idea.
Well, I know _that_ copies are cheap in svn, but I have no idea _why_. Yet I hope that I grok branches and tags.
After reading the post (and many others), I think I'm starting to get it:

-You have a directory/folder (data store) where all of the files are stored

-Some files (blobs) are data files

-Some files (trees) are hierarchy - they specify the structure of the data (blob) files, and how they connect to each other (like a database)

-Some files (commits) are snapshots (think VM snapshot) of the arrangement of the blobs on the trees (data file structure)

-Some files (tags) hold metadata about the data files (blobs)

And it's all tied together by SHAs (GUIDs) which are just a random number that's so huge it's probably unique.

You can use URLs to point to any of these files, so you can tie together files from different locations at once.

That argument doesn't really make sense. If x were fundamentally simple, why would I need a simple explanation for it? Clearly git is, or can be, complex, and a simple, simpler or simplified explication would be helpful.

None of that implies that "git is [an] unholy clusterfuck of a product," it means that git is complicated. For 99% of the work you do with git, it isn't even that complicated and you don't need to be aware of the data structure. As for the last 1%, well, that's what separates git from other (D)VCSs. Git gives me the power to do a lot, and incidentally, it gives me the power to shoot my foot off too...I still prefer it over say svn or hg, my personal opinion.

It is simple because a simple explanation is possible. Simple doesn't mean "intuitive to proverbial grandmother".

Git is clever, moderately novel and therefore unfamiliar (depending on your background), and simple. There are not many concepts present, and the concepts that are there are not difficult to understand, but those concepts need brief introduction because they are concepts that many will be unfamiliar with.

If you buy a checkers board it will come with a (very simple) rulebook. You aren't born with some sort of natural checkers ability, you have to learn it. Nobody would claim that checkers isn't simple though.

That's not the strongest argument. If you want to get anything done at a reasonable level, checkers is hard (American checkers is easier than international checkers, but neither is really simple).

Git may well be similar: relatively simple rules, yet hard to use proficiently.

Checkers is a very easy game, but with those incredibly simple rules you can get complex behavior. For an even more extreme example, you can look at Go. Git is similar, except there is no competition/competitor there to befuddle you. From the simple components/rules (you've got what, you can perform incredibly complex operations that are infeasible with lesser VCSs.

Simple rules/components, complex gameplay/capability.

If you want to play it simple, git does that too.

> Git is similar, except there is no competition/competitor there to befuddle you.

You haven't met some of the developers on my team.

I think in many ways it comes down to the way people think about and use the product. For me svn and perforce are way more of unholy messes than git is and I find them considerably more frustrating to use.