Hacker News new | ask | show | jobs
by lucaslee 2760 days ago
I saw the Linux kernel was recommended many times here, but how many people actually read it? Where do you even start? The Linux kernel has around 60,000 files and 25 million lines of code...

I think smaller projects are better for learning purposes. If you are interested in reading some smaller projects, check out my project here https://github.com/CodeReaderMe/awesome-code-reading.

7 comments

Most of the kernel code is in the drivers, the general purpose subsystems (VFS, I/O scheduler, task schedulers, memory management etc.) are a small fraction of those 25 million LOC and largely independent of each other so it is not that hard to build some understanding of them.

Some ways you can start:

- Here is start_kernel(), the kernel entry point after booting up and handling the lowest level stuff in asm: https://github.com/torvalds/linux/blob/v4.19/init/main.c#L53...

- grep for SYSCALL_DEFINE to find definitions of syscalls, e.g. this is open(): https://github.com/torvalds/linux/blob/v4.19/fs/open.c#L1076

(understanding how the I/O and networking system calls work is quite helpful for application developers, even if you work in node.js, python or another high level language)

- this is the struct that represents each process in the system, you can pick some interesting field and search for where it is used and where updated: https://github.com/torvalds/linux/blob/v4.19/include/linux/s...

Finally, the linux-insides book is pretty helpful: https://0xax.gitbooks.io/linux-insides/

> (understanding how the I/O and networking system calls work is quite helpful for application developers, even if you work in node.js, python or another high level language)

Being able to semi-quickly figure out how the kernel actually performs some I/O operations and what the exact semantics are is tremendously useful when working with low-level I/O applications (e.g. database-like applications).

Nobody ever wrote a 25MLOC program from start to finish, so I don't think it makes much sense to read it that way.

I'd read it the way it was written: from the beginning. What's Linux 0.01 look like? What's the next changeset after that release look like? What was necessary to add the driver for your favorite device? What changes were made for your particular CPU?

Programs are not static works (except maybe TeX and Metafont). They exist in the form they do in order to be amenable to changes. So look at the changes that drove it.

Read the early versions, that's exactly what I am doing too!

I read Bitcoin 0.1.5 [1], which only has 15,000LOC and is the first tagged version on Github. Compared with the current Bitcoin codebase with 320,000LOC, it's much less daunting!

[1]: https://github.com/CodeReaderMe/awesome-code-reading/issues/...

Have you actually done this? What was your experience?
Not for Linux, but it's how I approach new programs I have to work on.

I can't decipher this 1000-line function, but it came from somewhere. What did it start out as? That's what the author originally intended it to be. What caused it to grow? That's what features someone else thought it needed.

How long does this sort of thing take? Its a really neat approach but seems huge
In my view, "reading the kernel" is not really a useful exercise. Source code isn't a book, and most code isn't really a pleasure to read by itself.

Instead I would suggest that you try to debug a problem, or try to understand what a particular syscall does rather that starting from init/main.c. If you have a goal in mind (trying to solve a problem or understand a specific aspect of the kernel) you are far more likely to get useful information out of the exercise.

Another useful hint is that you should just ignore the majority of code that looks alien. Linux uses a lot of macros and synchronisation primitives that you probably aren't familiar with (RCU for instance). It's much more useful to take note of the things you don't understand and just move on to the code code that you do understand. Most of the macros and synchronisation primitives are used all over the kernel so you're very likely to get to grips with them over time.

I really hope that bpftrace will make it easier for people to get into kernel debugging and thus have a nicer "in" to kernel development.

I have found NTOS to be much better documented than Linux. For example,

https://github.com/markjandrews/wrk-v1.2/blob/master/base/nt...

Almost every line of code has comments.

Where do you start depends what you're after. You should start by defining questions, then by searching for existing documentation, then reading the header files and data structures, and then perhaps some code. Also read userspace counterparts of the interface you're studying if applicable.

Worked fine for me when getting into media subsystem to write camera and camera soc interface drivers.

James Hague's recommendations [0] of classical programs are on my bucket list.

[0] https://prog21.dadgum.com/210.html

The linux kernel has got, conservatively, thousands upon thousands of defects. The only reason to read it is in case you really need to know how it actually works, because the documentation for some syscall is wrong or your systems aren't working right in practice, or you need to know how some undocumented hardware works. Otherwise I'd say it's best left unread.