| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by BeefySwain 1087 days ago

Sidestepping all of the ethical questions of embarking on this "research", I'm surprised the number was that low.

Linux[0] itself has about 1.2 million commits, so apparently Linux is within an order of magnitude of bringing GitHub to it's knees?

[0] https://github.com/torvalds/linux

4 comments

eddythompson80 1087 days ago

Microsoft’s azure docs repo has 1.1M commits, and it’s many gigabytes big. I made the mistake of trying to clone it to fix an issue in the docs I ran into. Ended up just editing it on GitHub because fuck that.

https://github.com/MicrosoftDocs/azure-docs

link

vinyl7 1087 days ago

You can clone a few latest commits

  git clone -–depth [depth] [remote-url]

link

2h 1087 days ago

I dont think that works:

    > git clone --depth 1 https://github.com/MicrosoftDocs/azure-docs
    Cloning into 'azure-docs'...
    remote: Enumerating objects: 107158, done.
    remote: Counting objects: 100% (107158/107158), done.
    remote: Compressing objects: 100% (101843/101843), done.
    Receiving objects:  17% (18217/107158), 780.25 MiB | 43.72 MiB/s

link

metabagel 1087 days ago

I think it’s a rate issue, not the number of commits.

link

aloer 1087 days ago

iirc remember some years ago the homebrew repo caused too much load due to their architecture where every client would pull on install or update. Or something like that.

Part of the GitHub response afaik included the info that they went as far as they could with dedicated and beefier servers but asked for a software fix.

I would think that if GitHub anticipates a normal repo growing this large they can give it the special treatment

link

jwilk 1086 days ago

https://github.com/orgs/Homebrew/discussions/226

link

tikhonj 1087 days ago

There's a rough rule of thumb that you should expect to redesign your system to handle each order of magnitude increase in scale, and I figure it applies here too—gracefully handling that size of repo would require substantial engineering work, and they have plenty of time to handle it before human-oriented open source repos get even close to the current limit.

link

lucb1e 1087 days ago

I'm not sure redesigns were necessary between going 1 to 10, from 10 to 100, from 100 to 1000, from 1000 to 10'000, from 10'000 to 100'000, or from 100'000 to 1000'000 which we're now at. It sounds like a sensible engineering rule, but I'm not sure it translates to software, or at least not in this case. I don't know of any design changes made to Git since it was first created, there's no v1 and v2 repositories for example.

link

spiralx 1086 days ago

It depends on how quickly you pass through each order of magnitude milestone. I remember reading about how MySpace grew something like five orders of magnitude in less than a year, and no matter how scalable your architecture is you're going to hit a point during that where you need to rearchitect your whole system.

Slower growth allows for forward planning and incremental architectural changes.

link

aloer 1087 days ago

> there's no v1 and v2 repositories for example

We wouldn’t know. GitHub is probably running something very different to normal local git including optimizations for performance and cost.

They must only ensure API/protocol compatibility and could have already replaced everything else many times over.

link

aleph_minus_one 1087 days ago

> There's a rough rule of thumb that you should expect to redesign your system to handle each order of magnitude increase in scale

I rather know the rule: by good engineering, you can modify a system to handle a one magnitude increase with respect what it was designed for. As soon as a two magnitude increase can occur, you better redesign the system.

link