Hacker News new | ask | show | jobs
by iBotPeaches 1088 days ago
> I decided to see how many commits GitHub (and git) could take before acting kind of wonky. At ~19 million commits (and counting) to master: it’s wonky.

This just doesn't seem right to me. Why? Its obvious at some point you'll harm the service. If the goal was to test it, why not try locally with git.

4 comments

A good lesson to learn - If you as a service owner aren’t testing the limits to the point of failure and enforcing sensible guardrails around that, then some random user eventually will.
GitHub offers the service for free and doesn't publish or enforce any specific limit on number of commits. I see nothing wrong with a user pushing as many commits to it as possible. It's not his problem when to stop it.

This is also how I feel about the Tor project getting their knickers twisted over people who do research on the live network. If the network can't handle it, then it's not resilient to attack. Asking people nicely not to do stuff that degrades your product will not make the product suddenly anti-fragile.

It's this kind of attitude that's why we can't have nice things, though.

A service is offered for free, with no documented limits or restrictions, so you push the service to its breaking point... Just to see what happens?

Well, in the case of the Tor network its whole premise is that it's resilient to attack. So either it is or it isn't. If it's resilient but only as long as people treat it nicely, then it's not actually resilient. And anyone who can demonstrate that is doing a public service. It would be irresponsible to discover a flaw and not disclose it, or to continuously exploit it. But it's not irresponsible to look for the flaw in the first place.

In the case of GitHub, it's owned by a nearly trillion dollar corporation. Nobody is hurting some mom and pop business here.

> why not try locally with git.

Because you can't. GitHub is not open source, you'd need to steal the source code to try it locally. This comment is for educational purposes only, not trying to give OP ideas!!1

But you're right in spirit of course. Would be more interesting to install Forgejo/Gitea, GitLab, GitWeb, gitolite, TortoiseGit, etc., test them on various limits, and write that up in a nice blog post for magic internet points.

> "GitHub (and git)"

The "(and git)" portion can of course be tested locally. What OP will find out is that there is no more inherent limit on the number of commits in a repo than there is an inherent limit in the number of nodes in a linked list.

You can go on forever till you run out of disk space. Possibly repacking will eventually require more than available memory.

Testing git, which was a stated goal, could have been done locally.

It's obvious that the author is lying about that part, he only wanted to push GitHub to its limit, but he did say git:

> I decided to see how many commits GitHub (and git) could take before acting kind of wonky. At ~19 million commits (and counting) to master: it’s wonky.

git runs outside of GitHub, which is what the comment you responded to was saying.

Test the behavior of git locally, without testing GitHub.

I understood the comment, but that's not what OP was testing. They were doing the commits via merging pull requests. Git has no concept of a pull request and no HTTP API. From the post:

> The GitHub API has periodic issues merging/creating PRs. (I use PRs since that is more reliable than keeping a local master up to date via pulling at this point).

> Git has no concept of a pull request.

You are confidently wrong. Git, including pull requests, was developed years before GitHub ever existed. GitHub borrowed the term from git. Pull requests originally (before GitHub) are requests sent via email that one developer pull changes from another.

https://www.git-scm.com/docs/git-request-pull

The request pull command has been part of git since 2005:

https://github.com/git/git/blob/master/git-request-pull.sh

GitHub launched in 2008.

> and no HTTP API

Also wrong:

https://git-scm.com/book/en/v2/Git-on-the-Server-Smart-HTTP

There is nothing GitHub does with respect to git that you cannot do locally.

I'm not saying that you need GitHub for things like including parts of other repositories, but rather that the way GitHub implemented it is not code included in the git that you apt install.

I didn't know of the specific "request-pull" subcommand so thanks for that link. Still, both things you link are a bit different from how GitHub implements it, and I'd be very surprised if the HTTP API you link includes an endpoint for triggering the request-pull the way that GitHub has such APIs for their pull request mechanism.

If you meant to say that git can do anything GitHub can and we needn't use GitHub, I agree. I've used git in peer-to-peer fashion before, and especially now that it's Microsoft's, I think twice before opening repositories there. But if your main point was rather that git includes the same functionality as GitHub and that OP could have just tested the regular git instead of doing it on GitHub itself, I still think that's a rather different test target.

Just to make sure we're not talking past each other: OP wanted to test both "GitHub (and git)". OP could have tested the git portion locally.

But to engage you about the GitHub part: I believe that under the covers, GitHub is still using something substantially similar to git as the repo storage format. Git has no inherit limitations on number of commits. Eventually you run out of disk space, and possibly memory and/or CPU during repacking. You could turn off GC and let the repo remain unpacked. You might eventually run out of inodes. During cloning (and pulling), git implicitly creates pack files, so a clone/pull will also take a long time (CPU and or memory again) on an unpacked repo. This is why git periodically repacks.

If I had to guess, GitHub also has no inherit limits. Creating commits was probably periodically repacking on the git backend, consuming increasing amounts of resources.

I would be surprised if the GitHub API (the Ruby on Rails code) takes much resources at all.

Creating endless PRs is something you can simulate locally with two copies of a repo. You can use "git ls-remote" against a GitHub-hosted repo with PRs in it to see how it exposes PRs as references that are not normally cloned.

Regardless, I think that OP could and should have satisfied their curiosity about how git works locally, especially with respect to whether it has an inherent limits. And they could have satisfied their request about GitHub resource limits with a support request.

You can download GitHub Enterprise Server for free.
> Its obvious at some point you'll harm the service.

That’s not obvious at all. One would expect a professional service to have limits in place to prevent any negative impacts.