Hacker News new | ask | show | jobs
by saurik 5614 days ago
This is an interesting theory, but it isn't how git's client actually works. If the repository exported a list of "mirrors" to the client that it then stored and was willing to use, that would be awesome, but otherwise you have a million people out there who are now just getting error messages when they do "git pull" and the only fix is for them to go back to your web page and try to get information on what is happening and where else they can switch their origin. Meanwhile, if you do your own hosting you can just update your DNS to point to another box and no one is the wiser.

They key problem, frankly, is that GitHub conflates two entirely unrelated things: a nice UI and social features, and a hosted version of your repo. I love the idea of outsourcing a nice UI and having cool social features, and /maybe/ to make those features work they need to have a mirror of my repository (I'm not convinced), but when people go to pull it the URL listed should be the actual upstream "I own the DNS on this and feel I can make this stable in the long term", not the GitHub mirror.

2 comments

This is a straw man argument. The question isn't whether Git could be more intuitive/user-friendly (Hint: It should be, in fact I bet my company on it [see my profile]), or whether it is more secure/cheap to host your own repo.

If you have a million people `pull`ing from your repo, of course you should have be hosting your own public access point. But, in 80% of cases, people can't be bothered to figure out how to set up Gitosis, pay for slices, mess with DNS, etc. just to host a repo.

To put it another way, see: Heroku vs. EC2

This comment is totally unrelated and is itself a strawman. Yes, it is easier to use GitHub: I will not argue that fact. However, using GitHub will cause people to be pulling from GitHub, and GitHub may go down. This is a tradeoff, and is one people use a lot: you use a shared platform and give up control of the URL to get easier outsourced hosting. But to argue that git's decentralization solves that problem is disingenuous: it means that people could theoretically still pull your repository, but only after finding out what that fallback URL is and manually resetting their origin, which 90% of git users don't even know how to do. Meanwhile, many people are willing to spend the five minutes it takes to learn how to run their own server and want to avoid this tradeoff by hosting their own stuff on their own hostnames so they can publish stable URLs, but /can't/, because they like GitHub's social features, none of which (due to the aforementioned distributed features, humorously) actually require GitHub to be the canonical repository URL: if you want to use GitHub, you are going to have people cloning and pulling your GitHub mirror (or even worse: adding your GitHub mirror as a submodule) and when it goes down they are going to get errors, and you will have no control over it. That sucks.
Web services go down. It happens. You make the choice to use them anyway.

In this case, Github is very responsive about outages and clearly strives to eliminate or reduce them as much as possible.

And sometimes web services go down for good. Again, this is an understood tradeoff, and I'm not arguing that. What I do argue with is that "git is distributed" does not cancel out this particular tradeoff, which I'd the statement that was made by the person I am responding to. An actual solution used by many other services is "let me use my own hostnames with this service", which GitHub does not support for your repository, as While their fundamental value comes from the social features and nifty git UI, they seem to mentally be stuck in a "we are the git hosting company" mindset.
I would venture to say it does.

You can't really fault Github for individual teams not opting to host their code in more than one spot online, even if Github doesn't offer the capability for users to use their own domain name for seamless switching of git hosts.

Does Github encourage keeping everything centered at Github? Perhaps implicitly. But they certainly don't lock anyone's data in, so blaming them for their customers opting to NOT put their code anywhere besides Github seems unfair.

You are conflating "hosting in multiple locations" with "claiming to be a canonical URL". If I choose to host my repository at git.saurik.com, but want to be able to use GitHub's repository browsing features, social timelines, etc., I may choose to /also/ host a copy at GitHub.

However, people are now going to copy/paste the GitHub repository URL and use that to clone my repository, and that URL is going to end up as a large number of peoples' origins. Even worse, that URL may end up in third party projects as a submodule (which is much more difficult to retroactively change).

Again: the problem here is not that GitHub is somehow encouraging people to keep things at GitHub "centrally": it isn't, and the goal is not to have your data in multiple places.

In fact, that's what you need to /avoid/: there should be a single URL for "this is the git repository that we consider to be the official, canonical source for our (distributed) contributions to this project".

That URL should be one that you feel comfortable you can maintain for a long time, as that URL can end up baked into a lot of things. Some of them are theoretically easy to change (the million users who are pulling from that URL, assuming they know how to do that without just re-cloning), and some of them aren't (usages of your project as submodules in other peoples' projects).

To quickly put this in another, maybe simpler manner: the problem isn't that people aren't choosing to /also/ put their code in places other than GitHub, it is that putting you code /also/ in GitHub undermines your git repository URL.

But git-over-http is just a normal http client, and http supports redirects. So while nobody does this, it's possible to load-balance http clients to "valid" servers just like you would with any other http-based app.

ssh:// and git:// are more difficult, but project contributers with commit access can just ping you on irc to see what's up with the repo and where to push to today.

That's great, but still requires doing your own setup on your own hostname: conceptually that is a single repository with one URL.