| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sudarshnachakra 1718 days ago
	A genuine question. Why is that Google uses a custom kernel (with patches that could not be mainlined) when the below options are available. 1. Compiling the kernel with custom knobs 2. Write the custom code as Linux modules

6 comments

dboreham 1718 days ago

Because originally they were (way) ahead of the mainline. Article doesn't actually say this fwiw. Nowadays mainline is mostly caught up, but it's hard to rebase (article does say this).

link

danobi 1718 days ago

Because you can’t add a feature with a knob. And not all changes can be made in a module (eg core kernel). And even if they could, kernel internals are not stable and you would still need to rebase your out of tree modules.

link

geofft 1718 days ago

Not everything can be written as modules. The module API is fairly limited, and it doesn't let you arbitrarily customize the behavior of existing parts of the kernel. Examples from the article include OOM and scheduling. One I ran into myself recently is that, despite the name, Linux Security Modules (LSMs) are not loadable modules, and the LSM initialization code is unloaded after kernel boot, so even if I wanted to play tricks with unpublished APIs, the code is just not there.

A little more philosophicallh, if all these customization points were available as modules, the process of updating modules to work with new versions of the kernel would be exactly as much of a mess.

link

MichaelMoser123 1718 days ago

for OOM you do have a lot of flexibility with containers/control groups, nowadays. What kind of problems were they solving with the scheduler? I anything known about that?

link

dodobirdlord 1718 days ago

For background context, a reminder that control groups were originally developed in Google’s kernel fork, and later mainlined.

As for the scheduler stuff, the main change is a SwitchTo set of syscalls that allow threads to bypass the kernel’s scheduler and just continue execution as a different thread. https://lkml.org/lkml/2020/7/22/1202

link

MichaelMoser123 1717 days ago

Thanks for the link, it is explained in the linked video:

https://www.youtube.com/watch?v=KXuZi9aeGTw They explain it around 15:01 - google added it's own syscall switchto_thread - that puts the current thread to sleep and is switching to the argument thread id. (and some other calls too). That one helps with cutting down latency in inter thread calls for m:n threading. The real effort is to make latency for individual application requests predictable, while keeping it low.

link

MichaelMoser123 1716 days ago

the linked video briefly mentions that google has it's own futexs. What would be the difference between regular futexes and the google implementation?

link

geofft 1717 days ago

The article makes a passing reference to "Google Fibers" and "a new API for cooperative scheduling in user space."

There seems to be a Phoronix article with some more info and a link to a preparatory patchset that's public: https://www.phoronix.com/scan.php?page=news_item&px=Google-F...

link

fragmede 1718 days ago

Some things just don't want to be a knob. Weird stuff that no sane person would consider, until they need to, like raising the limit on how long a command line can be.

link

hinkley 1717 days ago

That one is not that weird and I don’t like having to use xargs to create multiple calls.

link

pjmlp 1718 days ago

Because as monolithic kernel, every feature touches lots of places and kernel modules have a very specific set of use cases.

link

bonzini 1718 days ago

It is not necessarily true that features touch a lot of places. But most of the changes are improvements to existing functionality (e.g. KVM, or the scheduler) rather than something self contained.

link

ndesaulniers 1718 days ago

I'd imagine drivers for lots of custom hardware that will never leave the datacenters.

link

CameronNemo 1718 days ago

But drivers for custom hw are easy to rebase. They are quite self contained.

It is when you get into kernel internals that rebasing continuously becomes a challenge. Some parts of the core Linux kernel don't change often, but I imagine many other parts see significant churn.

link