Hacker News new | ask | show | jobs
by ffsm8 224 days ago
> terabytes of source code

You sure that exists?

Git repositories that contain terabytes of source code?

I could imagine a repo that is terabytes but has binaries committed or similar... But source code?

4 comments

Google's monorepo is in fact terabytes with no binaries. It does stretch the definition of source code though - a lot of that is configuration files (at worst, text protos) which are automatically generated.
Google had 86TB of sourcecode data in Piper way back in 2016.
Dang, that's mind boggling - especially if I keep in mind that a book series like lord of the rings is mere kilobytes if saved as plain text.

Having 86 TB of plain text/source code - I can't fathom the scale, honestly

Are you absolutely sure there aren't binaries in there (honestly asking, the scale is just insane from my perspective - even the largest book compilation like Anna's isn't approaching that number - if you strip out images ... And that's pretty much all books in circulation - with multiple versions per title)

Each snapshot of the repo isn't that big, but all the snapshots together, plus all the commit metadata and such, are
git could never, but piper at google is way over that figure. Way, way over.
Microsoft has actually done a lot of work to scale got to large repos
It's why there's special Microsoft Git VFS (a lot like VFS at google that is also referenced in the talk).

It was made to make working on Windows source code possible with Git.

Very sure, i work in one