Hacker News new | ask | show | jobs
by sudarshnachakra 1718 days ago
A genuine question. Why is that Google uses a custom kernel (with patches that could not be mainlined) when the below options are available.

1. Compiling the kernel with custom knobs 2. Write the custom code as Linux modules

6 comments

Because originally they were (way) ahead of the mainline. Article doesn't actually say this fwiw. Nowadays mainline is mostly caught up, but it's hard to rebase (article does say this).
Because you can’t add a feature with a knob. And not all changes can be made in a module (eg core kernel). And even if they could, kernel internals are not stable and you would still need to rebase your out of tree modules.
Not everything can be written as modules. The module API is fairly limited, and it doesn't let you arbitrarily customize the behavior of existing parts of the kernel. Examples from the article include OOM and scheduling. One I ran into myself recently is that, despite the name, Linux Security Modules (LSMs) are not loadable modules, and the LSM initialization code is unloaded after kernel boot, so even if I wanted to play tricks with unpublished APIs, the code is just not there.

A little more philosophicallh, if all these customization points were available as modules, the process of updating modules to work with new versions of the kernel would be exactly as much of a mess.

for OOM you do have a lot of flexibility with containers/control groups, nowadays. What kind of problems were they solving with the scheduler? I anything known about that?
For background context, a reminder that control groups were originally developed in Google’s kernel fork, and later mainlined.

As for the scheduler stuff, the main change is a SwitchTo set of syscalls that allow threads to bypass the kernel’s scheduler and just continue execution as a different thread. https://lkml.org/lkml/2020/7/22/1202

Thanks for the link, it is explained in the linked video:

https://www.youtube.com/watch?v=KXuZi9aeGTw They explain it around 15:01 - google added it's own syscall switchto_thread - that puts the current thread to sleep and is switching to the argument thread id. (and some other calls too). That one helps with cutting down latency in inter thread calls for m:n threading. The real effort is to make latency for individual application requests predictable, while keeping it low.

the linked video briefly mentions that google has it's own futexs. What would be the difference between regular futexes and the google implementation?
The article makes a passing reference to "Google Fibers" and "a new API for cooperative scheduling in user space."

There seems to be a Phoronix article with some more info and a link to a preparatory patchset that's public: https://www.phoronix.com/scan.php?page=news_item&px=Google-F...

Some things just don't want to be a knob. Weird stuff that no sane person would consider, until they need to, like raising the limit on how long a command line can be.
That one is not that weird and I don’t like having to use xargs to create multiple calls.
Because as monolithic kernel, every feature touches lots of places and kernel modules have a very specific set of use cases.
It is not necessarily true that features touch a lot of places. But most of the changes are improvements to existing functionality (e.g. KVM, or the scheduler) rather than something self contained.
I'd imagine drivers for lots of custom hardware that will never leave the datacenters.
But drivers for custom hw are easy to rebase. They are quite self contained.

It is when you get into kernel internals that rebasing continuously becomes a challenge. Some parts of the core Linux kernel don't change often, but I imagine many other parts see significant churn.