| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by BobbyTables2 4 days ago

Is that even a lot?

Spread over a year, roughly estimating a generous 4 kbytes of data per commit, comes out to a throughput of a little under 2 MB/s.

Of course, it isn’t spread out uniformly and there is also a lot of hashing and other things going on.

Maybe pulls and clones drive more I/O ?

1 comments

3eb7988a1663 4 days ago

I suspect there is a cacophony of work that happens when a commit hits the server. That request needs to get replicated, git repositories need to be repacked, pull requests need to calculate diffs, CI jobs need to execute, on and on.

That's also just assuming the good-faith usage. There are probably plenty of adversarial and poorly behaved scrapers that are putting additional load on the system.

link

jamesfinlayson 4 days ago

Recalculate percentage of each language in the repo, recalculate top contributors, recalculate the stats for the committer's profile etc etc.

link

BobbyTables2 3 days ago

Scalable algorithms and data structures have existed for decades.

Even if they had 10 billion users with 10 billion repositories it shouldn’t be a big deal on a home PC.

link