Hacker News new | ask | show | jobs
by tbrock 2906 days ago
Why? It’s free for open source, reasonable for enterprise and private use and I think you could build a better one starting from scratch.

GitHub’s success is largely due to the network effect and it’s entrenched status as the canonical code repository.

Besides libgit2, aka “the secret sauce”, is already open source. What are you waiting for?

5 comments

Github has had stunning stability. I bet their architecture is nontrivial, that dozens of parts of their code are tied intricately to their architecture, and that opensourcing only the bits that will run on typical AMIs or on debian^Wdevuan would be little more than opensourcing helloworld.rb.
Right and I don’t want to discount that. The scale they’ve achieved and the team that delivered it are incredible but presumably the people who want it to be open source won’t need any of that.

They’ll be running it on a raspberry pi, not hosting the Rails or NodeJS repo.

> Github has had stunning stability

Has it really ? I remember many threads here on complaining about Github's up time

https://news.ycombinator.com/item?id=5808496

https://news.ycombinator.com/item?id=14893441

https://news.ycombinator.com/item?id=10984775

compared to gitlab, its a paragon of stability
You can already get GitHub in an AMI; it’s called GitHub Enterprise.
Complete with all the features in today's github.com? Including load balancing, sharding, DDOS protection etc.?
> libgit2, aka “the secret sauce”

I would describe GitHub's real "secret sauce" as the issue-tracking, wikis, project boards, and release management parts, that don't get represented in the repo itself.

Which is to say, if you wanted to commoditize GitHub (which is basically what "open-sourcing your secret sauce" means), you'd have to create some sort of library that allowed you to treat a git repo + all those other things as one structured data-object. You would be able to use said library to both operate on all those pieces of data locally; and to sync them between different Git hosting services that all share those features.

Or, better yet, figure out a way to put all those features into git itself, so that every git repo automatically transports those pieces of data alongside itself.

Fossil SCM has Integrated Bug Tracking, Wiki, and Technotes. It's not git but is distributed. It's used for sqlite project.
That's cool. I wonder why Git hasn't copied the implementation.
They are completely different. In fossil, commit history is sacred, where git users can rebase, and cherry pick to their hearts content
I meant the implementation of these side-features as objects sitting in the repo, with appropriate client commands for creating and editing them, etc.

Git already has notes, and signed commits/signed tags, which are all those same kind of "objects that just happen to be there." So they don't need to copy Fossil's architecture; they can just copy the way that said object types interact as dependents of commits (while letting them get blown away when commits themselves do.)

I’m not sure. Almost every GitHub competitor does those things better: phabricator, gitlab, etc...
Maybe "secret sauce" is the wrong term. People ended up choosing GitHub purely because of network effects, I think. But those features are the "lock-in" preventing individual projects from easily migrating away.

If Git repos just "had" wikis, issues, etc. inside them, the lock-in wouldn't be there, so people would be switching between Git hosts all the time—and there wouldn't really be much value in a "git host" at all, beyond what just having a Git dir on your own server, plus a native-GUI Git client supporting the wiki/issues/etc. features, would get you.

> If Git repos just "had" wikis, issues, etc. inside them, the lock-in wouldn't be there

People clearly never cared about that, since Fossil ( https://www.fossil-scm.org/index.html/doc/trunk/www/index.wi... ) has these things and it never caught on

Maybe it didn't catch on because of other reasons.
This. Git is not a best in class solution, and the community that has coalesced around it has predictably not chosen best in class tooling.

This is okay. It's just source control at the end of the day.

Yeah good point. That’s definitely true.

I think though that 99%+ repos or there have zero issues, pull requests etc.

Maybe there’s no secret sauce of any value and it’s just pointless github is closed source, like a form of DRM just being used against us cause we’re silly and let it be so even as we aspire to see open source flourish...
I started working on this actually. https://github.com/ioquatix/relaxo is a document database built on top of git. I recently added support for using a specific branch, so it should be trivial to allow a web app doing the things you suggest to store/run within the same git repo.
The problem with trying to commoditize those additional features and creating a standard is that although these different services have similarities they also have subtle differences. I don't know if the companies that create these products or their users would want to standardize because the reason for the differences is because people have different preferences.
Exactly. I wonder how people would react to if Microsoft made the GitHub source public (and available for further extension by the community) but didn't grant the license to redistribute/deploy it anywhere else. Would people be content with that, or would they be grabbing more pitchforks and protesting Microsoft's actions as some kind of toxic evil? I'm skeptical that that would make people happy, and if it doesn't, then it goes to show the ulterior motivation isn't actually to just to extend the platform and "scratch my own itch", as they put it. It's to let themselves move off GitHub.
> It's to let themselves move off GitHub.

It's more than that. GitLab raised the bar here. Being able to run GitLab CE internally has de-risked the decision to test internally. For the next wave of customers in the space, familiarity with GitHub open source isn't enough.

Your reasons against "why" read more to me like "why not" as you listed things unaffected by whether they open source or not. The benefits for "why" are the same for all open source software and are not unique to this situation. The only question is if the costs are too great.
>Why? >What are you waiting for?

If Github was made with micro services architecture, it could be split into open source "Client side + Test-backend" and closed sourced "Production-backend". Backend can be composed of some interface and multiple implementation such as test impl and prod impl. “the secret sauce” could be the "Production-backend" and MS have good in-house talents who operates Azure so no need for the help from OSS community to improve backend.

However for example +1 button took so long time to be implemented even though it seems like small change in client side code and some adjustments in database. That itch lead to the https://github.com/dear-github/dear-github open letter and signers listed here: https://docs.google.com/spreadsheets/d/1oGsg02jS-PnlIMJ3OlWI...

The last sentence of the open letter says: "Hopefully none of these are a surprise to you as we’ve told you them before. We’ve waited years now for progress on any of them. If GitHub were open source itself, we would be implementing these things ourselves as a community—we’re very good at that!"

I think that's OSS community want, including but not only I want.

This just isn't possible. No company in their right mind would take their monolith and rewrite a bunch of stuff just to please a couple randoms on the internet.

I think you're drastically underestimating the amount of code Github is powered by and how freaking long any type of refactor/rewrite would take. We're talking about years.

Oh, was Github monolith? I did not read any articles about their architecture so I assumed that Github could be composed by micro services, and in that case, splitting could go easier. I edited to "If Github was made with micro services architecture,..."

Does anyone have link to any interview or article talking about granularity of their architecture?

Reading Github Engineering blog posts related to their architecture. https://goo.gl/amkJfV As far as I read so far, Github looks like composed of micro services, or at least split into many distributed services.
I did a search and found http://highscalability.com/blog/2009/11/6/product-resque-git...

It's talking about a component of Github backend called Resque (Distrubuted Job Queue)

As long as I see these thing the architecture is highly distributed and it's possibly composed of micro services.

Github has been a well known Rails monolith. Absolutely they have services that power all sorts of stuff (they have a few blog posts on it) but it's just impractical to even start a discussion on splitting it up for the reasons you initially argued for.

I've been through a number of large rewrites/reworks that took monoliths much like Github (with many many many services behind it) and split them up into modular pieces and it's an insane amount of work that can take years. You simply need very good reasons (including business reasons) to do that.

Moreover, companies at these sizes just have a LOT of code all over the place. Tooling, infra, supporting services, etc... Not to mention it's just not useful to have external contributors for a business product like Github. Doing code reviews, addressing bugs that were introduced, spending time discussing things with contributors takes an incredible amount of time.

Basically if the reason you want Github open sourced (and reworked into some weird architecture you described) is so that people can contribute to fix things and add features....Github could/will just hire more devs to work on that.

Thinking about the fact that Github have Github.com (production service) and Github Enterprise (self hosted), could it be like this?

Github.com = Github Core + Production Services and Infra

Github Enterprise = Github Core + Services need for Self Hosting

Maybe it's not worthy to proceed based on assumption but if there is something like "Github Core" which is shared codebase between prod and self-hosted, open sourcing the core can be an option?

Thanks TheHydroImpulse for the insights. So Github IS monolith. It really makes sense that splitting is too much work for just open sourcing and I do not see business gain to invest money and people into it.

Then possible path might be isolating least coupled (and small) components of client side code and open source?