Hacker News new | ask | show | jobs
by nickpsecurity 3793 days ago
That's a good point. I've been ignoring learning Git as long as I can but almost everything on my todo list heavily uses it. Or ties into it as you said. So, I'm going to have to bite the bullet and learn it.

Yet, I swore Git fans told me its decentralized design avoids single points of failures where everyone has a copy and can still work when a node is down just not necessarily coordinate or sync in a straight-forward way. This situation makes me thing, either for Git or just Github, there's some gap between the ideal they described and how things work in practice. I mean, even CVS or Subversion repos on high availability systems didn't have 2 hours of downtime in my experience.

When I pick up Git/Github, I think I'll implement a way to constantly pull anything from Git projects into local repos and copies. Probably non-Git copies as a backup. I used to also use append-only storage for changes in potentially buggy or malicious services. Sounds like that might be a good idea, too, to prevent some issues.

6 comments

I'm sorry to be rude, but, it sounds like you should go learn Git and come back to this conversation.

The decentralized design does avoid single points of failures, and everyone does have a copy. So - check, check, great. Unfortunately (maybe..) everyone has put their master repos in the same place, which somewhat counteracts the decentralization. But there is certainly no immediate coupling between the Git repository on your computer and the Github repository it's pulling from. It's not like Github being down in any way prevents you from working on code you've already checked out, unless you need to go check out more code.

(The same obviously may not be true for package managers and build scripts that are not running in isolation from your upstream repository, which is where the problems have arisen.)

"I'm sorry to be rude, but, it sounds like you should go learn Git and come back to this conversation."

It looks like it.

"The decentralized design does avoid single points of failures, and everyone does have a copy. "

So, like many decentralized systems I've used, a master node gets worked around by other nodes who communicate in another way? Or would some retarded situation be possible where...

"Unfortunately (maybe..) everyone has put their master repos in the same place, which somewhat counteracts the decentralization."

...one node going down could prevent collaboration? Oh, you answered that. That sounds better than CVS but shit by distributed systems standards. I'll still learn it anyway since everyone is using it. Probably in next week or two.

No, it's not the same as a distributed system with master/slave nodes. The child nodes can function entirely in isolation from the parent. If you wanted to, you could treat another coworker's node as your master and download/upload to that. It's usually easier to have a tree structure where the root is your master repo, its children are your build servers or whatever, and the leaves are development machines. But that's entirely reconfigurable.

It's not surprising at all that if you make a master repo at the root of the tree, and it goes down, then you can't communicate it. But it doesn't prohibit any communication between other nodes, or re-wiring the tree, and it definitely doesn't inherently block development work on any of the other nodes.

It just so happens, though, that people's build scripts and package managers like to refresh packages from the root and don't handle failures modes of that operation very well. That's the only place problems emerge - besides the obvious fact that if your public releases of software go through the root, and the root is down, then you can't release until it's up. But you could easily make a new root if you wanted to.

"It just so happens, though, that people's build scripts and package managers like to refresh packages from the root and don't handle failures modes of that operation very well. "

That's the critical part. So, countering this risk is apparently a manual thing if one uses off-the-shelf tooling for Git. I'll just have to remember to look at that if I do a deployment. Put it on a checklist or something.

>So, countering this risk is apparently a manual thing if one uses off-the-shelf tooling for Git.

Not so much off-the-shelf tooling for Git, its more off-the-shelf tooling for Node/Ruby/Go/Rust/PHP.

Nothing about Node's npm really requires it to depend on a single GitHub, in fact I think you can use any Git repo. Its just that most tend to use a single Git repo, and there is no way to configure mirrors.

Thanks for the extra detail.

"and there is no way to configure mirrors."

Its that in Git itself or the project-specific tooling you're mentioning?

This is a social problem, not a technical one.
It's a pebkac issue. The software is fully capable of having multiple remotes, but it's rarely used that way.
Is there an easy config for that? Suppose I want to push to eg github and bitbucket (without sharing my creds with ifttt or similar)? Is a post-receive hook on a local pseudo-master the way to go?
See, for example, here: http://stackoverflow.com/questions/14290113/git-pushing-code...

    git remote set-url --add --push origin git://original/repo.git
    git remote set-url --add --push origin git://another/repo.git
Lol. Nicely put.
Git works as advertised, but when all your build processes start with a sync from the upstream master (the equivalent of "svn up") that a lot of build scripts required that to work, then they've thrown away that advantage when building.

Everyone with a checked out repo should have been able to develop and commit, branch and merge locally fine though.

Thanks for the clarification. This is the exact sort of thing I was wondering about.
> either for Git or just Github, there's some gap between the ideal they described and how things work in practice

The hub-spoke topology is the easiest way of distributing source code to a lot of people. If the hub goes down, this is what happens. If that leads to a halt in productivity, then that is a failure in contingency planning. Git gives you many tools to distribute your workflow, but that won't save you if your workflow is centralized around Github.

Granted, sometimes you don't really have a choice whether to depend on Github, such as when working with language package managers. Perhaps that goes to show that mirroring and resiliency should be a design consideration in those tools, but it's not a shortcoming of Git itself.

> even CVS or Subversion repos on high availability systems didn't have 2 hours of downtime

It's easier than ever to have HA with a DVCS: clone the repository somewhere else and keep it in sync with commit hooks.

Large FOSS projects (should) do this by keeping a self hosted repository, and mirroring somewhere else like Github, Bitbucket, etc. Internally, an org should be able to quickly stand up a SSH or HTTP server for the purpose, or have collaborators push-pull directly from each other. Worst case? Send patches. Git apply works really well, and you might be surprised at how clever git-merge is when everyone finally syncs up.

That's what it means to be distributed: there is no real concept of a "central" node, unlike Subversion. Every local checkout has a full copy of the repository history. Any centralization is a (somewhat understandable) incidental artifact of how Git is being used.

Makes sense. I'll try to remember that for my future checklist. Thanks for the details. Btw, you're site is down on my end from 2 browsers on my desktop and one on mobile. Might want to look into that as rest are working.
> Btw, you're site is down on my end

Hah, because it's been defunct for a while now. Thanks for the reminder, removed it from my profile.

Cool
> I used to also use append-only storage for changes in potentially buggy or malicious services. Sounds like that might be a good idea, too, to prevent some issues.

In a certain sense, git is "append-only". If you change a commit in history, every ancestor commit will have its SHA hash changed. Naturally this will conflict with other copies of the repository.

For backups you should do a "git clone --bare" which checks out the internal git structure with data and history, but not the actual files.

I figure it's append only at protocol level. Usually a smart idea for SCM. Is that still true when the whole datacenter goes down in mid-operation? Typically varies from implementation to implementation of the concept.
Git is to GitHub as JavaScript is to Java. Though their names are similar they are very different things.
git != github
Hence Git/Github in my comment. I already know there's a difference. I just don't know much more than that until I learn the two.
Github is to git as Sourceforge is (used to be) to subversion, but with a better UI.

And yes, there have been concerns raised about what would happen if Github took a turn like Sourceforge, which usually get brought up when information about new shady practices at Sourceforge come up (or they get rehashed here).

Makes sense. I'm quite interested in seeing where it goes over time. I think it will depend a lot on the nature of the company. If it's VC-funded & aiming for acquisition, then there's a decent chance of Sourceforge history repeating. Otherwise, it might stick around as a beneficial ecosystem. Time will tell.
If you understand the difference between the two, you'd realize your comment makes no sense. The fact that github went down due to a power failure has nothing to do with git as a solution.

The fact that everyone uses git more or less the same as svn is the problem. Git is decentralized, but because so many people rely on github most don't ever use the decentralized aspect to it.

If you understood my comment, you'd know I don't understand the differences between the two that much since I haven't studied them yet. Been clear in a few comments on that. The reason I associate them here is that most projects I see don't just use Git: they use Github, too. So, I briefly wonder and get feedback about how inherent Github-style downtime was or if it was configuration/deployment issues.

Several commenters helpfully described how Git can easily prevent stuff like this and that project-level stuff is why this is a liability. That's good to know as it's already a selling point to management types for a solution like it. Can just ensure the problem doesn't show up in a local deployment by a wiser configuration.

I understood your comment just fine, but the opinion you had formed was based on false assumptions, so I was trying to correct it, that's all.

Personally I try not to form strong opinions about things I haven't actually learned or understood yet.

All good haha