Hacker News new | ask | show | jobs
by kevinjahns 2139 days ago
Based on the [B4] benchmark, we can predict the size Yjs document representing the complete editing history of the Linux kernel (probably the largest Git repo ever created): 864 MB. The size of the Git repository is currently 1.1 GB. So Yjs has better encoding.

If you would load the the Yjs document containing the editing history of the Linux kernel in-memory, you would use about 13.8 GB of memory. Of course, you wouldn't write the complete Linux kernel in a single file. As of 2011, the project consisted of ~37,000 files. If you represent each file as a separate Yjs document, you would use, on average, just a few kilobytes to load a single file.

The editing history of the Linux kernel is a very interesting benchmark resource. Maybe I will add it to crdt-benchmarks.

1 comments

> the complete editing history of the Linux kernel (probably the largest Git repo ever created)

Linux is definitely not the largest git repo ever created [0]. The big corporate monorepos are definitely larger; I know MS has moved Windows to git, and itself claims it to be the largest ever created (~300GB as of 2017, per [1]). Google and Facebook both eschew git, though.

Finding data on the largest open repos is more difficult. The largest classes of projects are those that develop in monorepos that implement critical operating system [2] functionality, browser engines, and compiler implementations. The shortlist I'd make comes out to these projects (in no particular order):

* gcc

* LLVM

* Mozilla

* Chromium

* Linux

* OpenJDK

I haven't finished downloading all of these repos (my disk is begging me to stop right now), but it looks Linux is larger than gecko-dev by a very thin margin (so a putative gecko-dev that included comm-central with its CVS history as well would easily outstrip Linux), and Chromium seems to be an order of magnitude over both.

[0] To be clear here, I'm mostly thinking in terms of primarily textual repositories. Repositories with large binary assets are clearly not relevant for your means.

[1] https://devblogs.microsoft.com/bharry/the-largest-git-repo-o..., although https://news.ycombinator.com/item?id=14411724 claims that the 300 GB measures the size of the checked-out directory on disk, not the putative size of a full .git folder.

[2] I'm including both kernel roles as well as key userspace roles. Qt and Gnome would both be on my list of putative largest repos were they monorepos, but they appear to use many small repos instead.