Hacker News new | ask | show | jobs
by pandem 3025 days ago
The Linux kernel has been developed over 25 years by thousands of contributors, so it is not at all alarming that it has grown to 1.5 GB. But if your weekend class assignment is already 1.5 GB, that’s probably a strong hint that you could be using Git more effectively!

Git is only 12 years old, how does Linux have 25 years of history there? As far as I know Linux used patches on mailing lists before git, are those also somehow transferred to the repo?

4 comments

"The repo" only dates back to v2.6.12-rc2 shortly after the first release of git, but there are repos with imported code from previous VCSes:

https://stackoverflow.com/questions/3264283/linux-kernel-his...

https://landley.net/kdocs/fullhist/

The first link describes using git's "grafts" feature to make the UI believe that the first commit in the normal repo actually has parents, which means you can use the repo normally and agree with everyone else about commit numbers, but also `git log` will go all the way back to Linux 0.0.1. I had this setup on my work machine in 2012 and it was useful a few times, but in the last couple of years I haven't really needed to see history past 2.6.12.

(But yes, the history repos don't explain the size of the normal linux.git repo - except to the extent that you need to spend over a decade writing an OS to get even that many lines of code in the first commit and that much activity shortly thereafter.)

It doesn't:

    commit 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 (tag: refs/tags/v2.6.12-rc2)
    Author: Linus Torvalds <torvalds@ppc970.osdl.org>
    Date:   Sat Apr 16 15:20:36 2005 -0700

    Linux-2.6.12-rc2

    Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!
patches on mailing lists before git, are those also somehow transferred to the repo

Given that conceptually git is just a linked list of patches, I can't imagine why they wouldn't have that history

Actually that's what VCSs before git used to be and what git changed. Git doesn't keep patches, it keeps full states of the repository in a content addressable fashion. It's one of its key insights. Instead of having to have an always correct way to encode deltas just encode the state itself and leave it to the tools to figure out what the diff should be. That way you're not encoding in your disk format something that can be done better in a later version of the tool.
That said, git doesn't just store direct copies either. It will bundle things up into packfiles as it calls them to do compression and encoding of various forms to reduce disk space and make it quicker to find a given version of a file

https://git-scm.com/book/en/v2/Git-Internals-Packfiles

Git the tool does packfiles but that's an implementation detail. Git the VCS can work with any object storage backend.
Well, I don't think anyone has the complete history in terms of patches. It just doesn't exist, without being reconstructed.
> Given that conceptually git is just a linked list of patches, I can't imagine why they wouldn't have that history

You can construct a linked list of patches for any series of commits, but git doesn't actually store patches or diffs - only raw content.

They had version control before Git. Git was born after BitKeeper's licensing changed.