Hacker News new | ask | show | jobs
by lucb1e 1087 days ago
> why not try locally with git.

Because you can't. GitHub is not open source, you'd need to steal the source code to try it locally. This comment is for educational purposes only, not trying to give OP ideas!!1

But you're right in spirit of course. Would be more interesting to install Forgejo/Gitea, GitLab, GitWeb, gitolite, TortoiseGit, etc., test them on various limits, and write that up in a nice blog post for magic internet points.

4 comments

> "GitHub (and git)"

The "(and git)" portion can of course be tested locally. What OP will find out is that there is no more inherent limit on the number of commits in a repo than there is an inherent limit in the number of nodes in a linked list.

You can go on forever till you run out of disk space. Possibly repacking will eventually require more than available memory.

Testing git, which was a stated goal, could have been done locally.

It's obvious that the author is lying about that part, he only wanted to push GitHub to its limit, but he did say git:

> I decided to see how many commits GitHub (and git) could take before acting kind of wonky. At ~19 million commits (and counting) to master: it’s wonky.

git runs outside of GitHub, which is what the comment you responded to was saying.

Test the behavior of git locally, without testing GitHub.

I understood the comment, but that's not what OP was testing. They were doing the commits via merging pull requests. Git has no concept of a pull request and no HTTP API. From the post:

> The GitHub API has periodic issues merging/creating PRs. (I use PRs since that is more reliable than keeping a local master up to date via pulling at this point).

> Git has no concept of a pull request.

You are confidently wrong. Git, including pull requests, was developed years before GitHub ever existed. GitHub borrowed the term from git. Pull requests originally (before GitHub) are requests sent via email that one developer pull changes from another.

https://www.git-scm.com/docs/git-request-pull

The request pull command has been part of git since 2005:

https://github.com/git/git/blob/master/git-request-pull.sh

GitHub launched in 2008.

> and no HTTP API

Also wrong:

https://git-scm.com/book/en/v2/Git-on-the-Server-Smart-HTTP

There is nothing GitHub does with respect to git that you cannot do locally.

I'm not saying that you need GitHub for things like including parts of other repositories, but rather that the way GitHub implemented it is not code included in the git that you apt install.

I didn't know of the specific "request-pull" subcommand so thanks for that link. Still, both things you link are a bit different from how GitHub implements it, and I'd be very surprised if the HTTP API you link includes an endpoint for triggering the request-pull the way that GitHub has such APIs for their pull request mechanism.

If you meant to say that git can do anything GitHub can and we needn't use GitHub, I agree. I've used git in peer-to-peer fashion before, and especially now that it's Microsoft's, I think twice before opening repositories there. But if your main point was rather that git includes the same functionality as GitHub and that OP could have just tested the regular git instead of doing it on GitHub itself, I still think that's a rather different test target.

Just to make sure we're not talking past each other: OP wanted to test both "GitHub (and git)". OP could have tested the git portion locally.

But to engage you about the GitHub part: I believe that under the covers, GitHub is still using something substantially similar to git as the repo storage format. Git has no inherit limitations on number of commits. Eventually you run out of disk space, and possibly memory and/or CPU during repacking. You could turn off GC and let the repo remain unpacked. You might eventually run out of inodes. During cloning (and pulling), git implicitly creates pack files, so a clone/pull will also take a long time (CPU and or memory again) on an unpacked repo. This is why git periodically repacks.

If I had to guess, GitHub also has no inherit limits. Creating commits was probably periodically repacking on the git backend, consuming increasing amounts of resources.

I would be surprised if the GitHub API (the Ruby on Rails code) takes much resources at all.

Creating endless PRs is something you can simulate locally with two copies of a repo. You can use "git ls-remote" against a GitHub-hosted repo with PRs in it to see how it exposes PRs as references that are not normally cloned.

Regardless, I think that OP could and should have satisfied their curiosity about how git works locally, especially with respect to whether it has an inherent limits. And they could have satisfied their request about GitHub resource limits with a support request.

You can download GitHub Enterprise Server for free.