Hacker News new | ask | show | jobs
by BobbyTables2 4 days ago
Is that even a lot?

Spread over a year, roughly estimating a generous 4 kbytes of data per commit, comes out to a throughput of a little under 2 MB/s.

Of course, it isn’t spread out uniformly and there is also a lot of hashing and other things going on.

Maybe pulls and clones drive more I/O ?

1 comments

I suspect there is a cacophony of work that happens when a commit hits the server. That request needs to get replicated, git repositories need to be repacked, pull requests need to calculate diffs, CI jobs need to execute, on and on.

That's also just assuming the good-faith usage. There are probably plenty of adversarial and poorly behaved scrapers that are putting additional load on the system.

Recalculate percentage of each language in the repo, recalculate top contributors, recalculate the stats for the committer's profile etc etc.
Scalable algorithms and data structures have existed for decades.

Even if they had 10 billion users with 10 billion repositories it shouldn’t be a big deal on a home PC.