Hacker News new | ask | show | jobs
by codeka 3118 days ago
That's still not even close to Google's repository:

"The Google codebase includes approximately one billion files and has a history of approximately 35 million commits spanning Google's entire 18-year existence. The repository contains 86TBa of data, including approximately two billion lines of code in nine million unique source files."

Source: https://cacm.acm.org/magazines/2016/7/204032-why-google-stor...

1 comments

Thanks for this quote.

It prompted me to do a quick afternoon experiment with how git would handle a billion lines of code:

https://news.ycombinator.com/item?id=15892518

As another user mentioned, many git actions scale linearly in the number of changes, not in the size of the repository. Try recreating the scaled repo, but say, in commits of 1000 lines each (ie. 200K commits), and see how long things take.
Did your experiment also do 40,000 changes per day (35 million commits, of varying sizes throughout the repo), and then see how that affects git performance? My (admittedly crappy) understanding of git is that it also scales on the commits, not just the raw file number/size count.