Hacker News new | ask | show | jobs
C++ in the Linux Kernel (threatstack.com)
112 points by davikrr 1659 days ago
13 comments

> You see the problem. My C++ code expected the calling convention that pushed arguments on the stack,

that would be very weird on Linux. The x86_64 linux ABI mandates that the first arguments go on registers afaik (I'm assuming x86_64 here since the post mentions linux distros which are overwhelmingly x64). What compiler would default to a pure stack-based calling convention ? Certainly not GCC or clang, no ?

> and the kernel expected my code to pass arguments in the registers.

so, how is that a problem with C++ and not the compiler defaults ?

> I found the real gold mine of C++ kernel module development knowledge in OSDev.org. They have an entire article on C++ issues. That includes avoiding templates

bullshit it is then. https://www.youtube.com/watch?v=A_saS93Clgk

if templates are good (sometimes better even) on AVR microcontrollers with memory in kilobytes, there's no reason to not use them in a kernel meant to run on large embedded.

Also what's that rant about strings for ? In the end there is zero substance to this article, only very strange rants.

> that would be very weird on Linux. The x86_64 linux ABI mandates that the first arguments go on registers afaik (I'm assuming x86_64 here since the post mentions linux distros which are overwhelmingly x64). What compiler would default to a pure stack-based calling convention ? Certainly not GCC or clang, no ?

The System V i386 ABI passes parameters through the stack. Perhaps that is what the author is referring to, although I wouldn't be surprised if he mixed it up with the x64 ABI.

In case it is talking about the x64 calling convention, there is actually some kind of an odd case where you would get something that looks like an argument was pushed on the stack:

When a non-trivially-copyable object is passed by value to a function, you need to ensure that through the lifetime of the copy, it's address will never change because the constructor may have stored address of some of the field (for instance, a pointer to a field). The way this is handled at list in the SystemV x84_64 ABI is that the object is created on the caller's stack, and a pointer to it is stored in a register, so just like if it was passed as a pointer.

I have seen several cases where a header would have an "ifdef c++" clause with copy constructors and destructors in them ("It does not add field so it should be OK, right ?"), which make the object non trivially-copyable, leading to clashing calling convention between C and C++ codes. I am curious about if this may be the issue he encountered.

Too late to edit so I'll just write a message here: The hypothesis written above would not explain the author's complains as he mentions he was able to fix it with a compilation flag, whereas I doubt the compiler would change non-trivially-copyable being handled as reference to a copy since it would break correctness.
The 32-bit Linux kernel uses a register-based ABI internally, rather than System V.
> The System V i386 ABI passes parameters through the stack.

No. The C/C++ ABI is quite uniform across architectures. The first 1..N (N is ISA dependant) parameters that can fit into a CPU register are passed via registers. The first input parameter that _can't_ fit into a register (e.g. a structure passed by value) is pushed onto the stack, with every other following parameter being pushed onto the stack as well. N+1… parameters are always passed through the stack.

> No. The C/C++ ABI is quite uniform across architectures. The first 1..N (N is ISA dependant) parameters that can fit into a CPU register are passed via registers.

Here is the "System V i386 ABI" mentioned above: https://refspecs.linuxfoundation.org/elf/abi386-4.pdf (from https://refspecs.linuxfoundation.org/). It clearly passes all arguments on the stack, and none on registers ("Function Calling Sequence" starting on page 35). That is the ABI used on 32-bit x86 Linux if you don't specify -mregparm (which the kernel uses); since the author was calling the compiler directly (which was necessary because the kernel makefiles only have rules for building C files, not C++ files), there was a mismatch between the -mregparm used by the kernel and the default ABI used by the C++ compiler, which was fixed by also passing -mregparm to the C++ compiler.

You are not incorrect, and I shall ruminate on why I had thought that the SysV ABI on i386 used %rax ÷ %rex as input function parameters without having to resort to the use -mpregparm. Thanks for the correction.
> The C/C++ ABI is quite uniform across architectures.

How can it be? What about an architecture without conventional registers? And for example I work on an implementation of C/C++ that logically uses the heap for its ABI.

Quite uniform != completely uniform.

Especially when it comes to C (less so C++), it is a remarkably adaptable language that has been able to attune to a variety of vastly incompatible hardware architectures, including stack based ones, heap based ones and some esoteric ones as well. Yet, in the case of conventional, register based ISA's, the ABI has been remarkably similar: nonwithstanding actual ISA specific register names, registers 0…N (apart from RISC ISA's where storing into/loading from the register 0 is a no-op / zero constant) are used as input parameters and register 0 (where available) is used as the function return value (provided it can fit in); otherwise the return result is returned via stack.

> Quite uniform != completely uniform

Don't know if you're a non-native speaker, but no 'quite' usually does means 'completely'!

https://dictionary.cambridge.org/dictionary/english/quite

> so, how is that a problem with C++ and not [...]

I don't think you should see this article as a criticism of C++. Just a rent on how hard it is to use in the Linux kernel which is openly against it.

Perhaps the article is old enough to have been written in the 32-bit era?
The Page Info I see says this:

> article:published-time 2016-10-28T11:40:06+00:00

which is well into the era of 64-bit code.

People are still complaining about macOS dropping 32bit despite the last 32bit hardware having been dropped a decade ago.

Some people (especially game devs) are bizarrely obsessed with 32bit :-/

The LDD3 mentioned is 32-bit era, and 2.6.x kernel, which had a CONFIG_REGPARM to allow passing parameters in registers (because the default was not to do that).
AVRs don't have enough storage for templated code to explode into an unmanageable problem.
Somehow C64 can deal with them.

"CppCon 2016: Jason Turner “Rich Code for Tiny Computers: A Simple Commodore 64 Game in C++17”"

https://www.youtube.com/watch?v=zBkNBP00wJE

"C++20 For The Commodore 64"

https://www.youtube.com/watch?v=EIKAqcLxtT0

I love these videos, Jason is doing fantastic work with his YouTube channel and C++ Weekly series.
Check his CppCon 2021 presentation, done on a C64 emulator thanks constexpr. :)
Thanks, I will :)
But a C64 is still only talking about 64kB of RAM. The post you're replying to claims that Template complexity gets unreasonable on a large machine, such as a Linux system. Not sure I agree, but "it's fine on a C64" isn't evidence in your favour unless you've forgotten Linux doesn't even run on a Commodore 64.
So it is fine on a 64KB system, but unmanageable on a platform that gets all the different kinds of boilerplate to run cloud workloads, got it
C++ or even C isn't exactly "fine" on any 8-bit system though. It's nice for a little demo, it can even be tolerable for some real-world projects when mixed with large amounts of inline assembly, but those 8-bit ISAs have been designed mainly for manual assembly coding, not high level compiled languages like C.
I mean, apparently this is confusing, but yes, obviously.

If your Commodore 64 template is dealing with say, foozles that might be 8-byte, 12-byte or 16-byte, the complexity incurred is pretty small, bugs with foozle<16> are likely to be something mere mortals can understand and fix.

On a more complicated system like a cloud Linux setup the template may be for foozles that can be in any colorspace and on a remote machine or locally, and now sometimes the bug with foozle<HSV,remove> involves a fifty line error message because the compiler doesn't realise all that happened is you meant to write foozle<HSV,remote> ...

It's not even as if the C++ committee isn't aware that templates are a problem. Remember template meta-programming wasn't really intended from the outset, and a big part of the point of Concepts was to at last let you write code that compilers can provide readable errors for when it's wrong.

C++ templates are unwound at the compile time before the «expanded» template code passes along into the optimiser where most of the unused code is elided.

Unless the templates have been externalised (i.e. defined as «extern template …», of course). Even then, a modern compiler+linker combo will optimise most of the unused code away at the linking time thus reducing the final binary size. I do understand that the LTO might not be available for every embedded platform, though.

P.S. That is exactly the point of the C++ template metaprogramming – the hard lifting is delegated to the compiler, which leads to increased compile times but also to more efficient and very compact runtime code.

What do templates have to do with storage, though? My primary attraction to C++ templates is that they let me write very expressive code that will compile down to a handful of instructions. Now, actually compiling complex C++ templates on a storage-constrained system can be a problem, since templates are compile-time beasts, not runtime. Once compiled, though, they have a Cheshire-cat existence.

Edit: Unless you're doing something rather silly with the templates, but again, that's not a template problem.

The general complaint with templates is they are instantiated and if you’re not careful can bloat the binary with multiple versions of a piece of code. But this is usually something pretty easy to solve: just don’t do something that would cause that to happen :P
I have _never_ had any issue with C++ templates on _modern_ µControllers such as the ESP32. Unless you have an incredibly minuscule flash, modern GCC or LLVM are very good at deleting unused code when you compile everything with -Os. Even -Og isn't that critical either.
> I found the real gold mine of C++ kernel module development knowledge in OSDev.org. They have an entire article on C++ issues. That includes avoiding templates

No, it doesn't?!? The linked article mentions templates two times (+ 2 mentions of the standard template library), once saying that templates can be used without further setup and the other times recommending that some template based data structures should be implemented. That's pretty far from "avoiding templates".

Lol, part of me likes the effort taken just because, but the kernel devs _really_ do not want C++. One hint: "struct class"

https://elixir.bootlin.com/linux/latest/source/include/linux...

What bothers me about that is that, because C doesn't have namespaces, it's already a terrible name for a struct. What if you want another "class" of thing?

Call it device_class ffs

You're dismissing the fact that the keyword collision really well might be intentional, the worst of it is that `/sys/class` siblings `bus` and `driver`, if their internal linux rep is actually in the `class.h` siblings, are called `struct bus_type` and `struct device_driver`
I'd say it was extremely intentional given this:

https://lwn.net/ml/linux-api/20180905165436.GA25206@kroah.co...

And that's for a userspace header.

   #define class Class
   #include <some_linux_header.h>
   #undef class
/s
Wasn't there a period recently — around 1995 or so — when the Linux kernel had to be compiled with a C++ compiler?
Quoting from http://vger.kernel.org/lkml/#s15-3 :

"In the dark old days, in the time that most of you hadn't even heard of the word "Linux", the kernel was once modified to be compiled under g++. That lasted for a few revisions. People complained about the performance drop. It turned out that compiling a piece of C code with g++ would give you worse code. It shouldn't have made a difference, but it did. Been there, done that."

I wonder if it is still true. C++ compilers have come a long way. (as have C compilers). C++ is 99% a superset of C, I'm not sure how much of that last is used in the kernel, so it might be too much effort, but C++ is in a few cases stricter than C in ways that compilers can use to optimize. Many C programs run faster when compiled in C++ these days.

If there is a difference (either way) I'd expect it to be something you can measure, but not something you would notice in the real world on one computer. (though at google scale it probably shows up)

I don't think that's accurate. Maybe you're thinking of how it needs gcc extensions?
Why was that the case?
as long as nothing you're including includes that in c++, it shouldn't be an issue at the linker level, and thus not be an issue at all.

at least in general - if it's something that can't handled by a c shim then you might have an issue.

It's one of the most fundamental structures in the kernel. Pretty much all driver headers include it, if indirectly.
In the past, I wrote a unix like kernel from scratch in C++. I have summarized what I had to do to get C++ code run on bare metal in this article https://www.avabodh.com/cxxin/nostdlib.html
I've always been interested in writing my own Unix-like kernel! Could you share what resources you used to write it? How long did the whole thing take?

Just to understand the scope of the work, did you implement any of the following: memory isolation, networking, concurrency via interleaving on single thread, parallelism where n threads can run n processes simultaneously? How long did each take to get done?

I did this while I was doing my bachelor degree course. It was four year course and I started doing this sometime in 2nd year and continued till 4th year. I was not always writing code as I had to study other subjects as well. Also I was just learning coding and other computer science concepts, so it was like learning and writing code. But the writing the kernel forced me to learn many computer science concepts very deeply.

At the end, what I had was a kernel which could boot on bare metal (or VM) and provided a command line interface. It had a virtual file system layer and ext2 file systems, process management (fork, exec sys call), memory management (paging and process isolation) and device drivers for keyboard and hard disk. The kernel was able to fork and exec static ELF binary.

I did not reach to networking and threading. But that was next step which could make it complete unix kernel.

I implemented in bits of assembly(nasm) and C++. So I had to learn runtime and code generation aspect of c++. Based on that learning I wrote this articles on c++ object models and other internals. https://www.avabodh.com/cxxin/cxx.html

These are all standard C and C++ interop problems also found in userland and typically go away if the C project at hand is cooperative.
There's one problem with rants such as these, it's too easy for someone to be exposed as clueless and broadcast his lack of knowledge and assumption-heavy development process to the world. How is that as an advertisement for one's employer?
Valueless Article. Please stop posting these sort of articles which have no information content.

The article is merely a rant because the author doesn't have much of an idea of how C++ actually works. Merely knowing the syntax doesn't make one a "C++ programmer" and this is even more true when you are messing around in the Kernel. The article contains no specifics only general statements making me think this was put up to just be a "hit piece".

With all due to respect, your comment is an anti-specialization rant.

> Merely knowing the syntax doesn't make one a "C++ programmer"

Does knowing all the possible abstract layers (uh, it's an ocean) make one a C ++ programmer then?

> this is even more true when you are messing around in the Kernel

It's his right to mess around Kernel and learn things.

My comment has nothing to do with "anti-specialization" or "right to mess around" anything.

The article has zero substance with a generic rant being "i tried to use C++ to write a Kernel Module and ran into problems". There are no specifics w.r.t. C++ nor The Kernel and yet the author blames the C++ Language! Whatever is written up also betrays a certain ignorance of basic C/C++ ABI conventions leading one to surmise that the author is clueless (w.r.t. these two domains). As you can see from other comments in this thread, many others are also of the same opinion while others are guessing all over the map as to what the actual problem might be.

> A first-year computer science student can tell you that the arguments get pushed onto the stack. In other words, a call to this 3GL function results in the following assembly pseudo code

Are people this ignorant when it comes to C/C++ or any systems language? ABI & calling conventions were introduced early in my C & C++ textbooks (age 13 btw, not even close to college years).

Well, if you believed as the author did that arguments are always pushed onto the stack, you are pretty ignorant--most major architectures these days don't use the stack for arguments, at least not for the first several arguments.

(Semi-random tangent: the hardest bug I ever had the pleasure of debugging was when I discovered that the PLT glue code to load an entry into the PLT was unexpectedly clobbering a register that the calling convention said needed to be preserved. By very, very careful using non-default calling conventions across shared object boundaries!)

I think it would depend on which system you were introduced into. Also 99% sure in my classes in the mid 90s they taught stack push. Which made sense as registers were pretty valuable. It was not until RISC came along, and register renaming, that you could consider 'wasting' them on passing args in the general case. In the 'DOS'/'Win16' world calling conventions were all over the place. You could get into trouble real quick if you did not pay attention to those calling convention modifiers. Especially if you were using libs from different compilers. In the linux world where you can control the whole stack it is easier to say 'this way and if you stray away from it, good luck'.

Small sample of the remnants of that in the DOS world. https://docs.microsoft.com/en-us/cpp/cpp/argument-passing-an...

Ab initio first years probably just about know what registers are so I can believe that.

Decoupling the compilers optimizations and the ABI (particularly what constitutes a "move" of a struct) has derailed a few conversations I've been involved with - even from very smart devs (although mainly interpretation rather than basic misunderstandings like thinking what is actually due to the ABI is an optimization)

The article has maybe three paragraphs of actual information.
> Kernel developers obsess about speed and performance. The Linux kernel is built using -mregparm=3, which is sometimes called fastcall.

I've never messed with calling convention for the sake of performance before so I found this bit interesting. I found more info about it at: https://en.wikipedia.org/wiki/X86_calling_conventions#Borlan...

Does anyone have benchmarks? Assuming I don't care about ABI stability, what's the fastest calling convention?

> Assuming I don't care about ABI stability, what's the fastest calling convention?

I'd assume a modern optimizing compiler will, in situations where it's permitted, create completely novel calling conventions depending on the situation. Whole program optimization is one area you might see this.

The compiler tends to be limited in how it can change calling conventions by external visibility of the functions. Generally if you compile a function down to an object file the compiler will want to make that object file linkable with any other object files importing that symbol properly.

Whole program optimization gives the compiler some ways around that. I am not sure how much freedom it gives the compiler.

With LTO the compiler should be free to fudge the calling convention for most calls even between translation units.

Here is GCC doing that optimization for a static noinline function: https://godbolt.org/z/cn6Wz9Kvn

Similarly, compilers can also clone functions if it makes sense to propagate constants from call sites into the function. Example: https://godbolt.org/z/59z6xT75n

I'm sure there is more room for improvement. A perfect compiler would always optimized programs as a whole and only regard function boundaries as hints at best. In practice, you have to keep complexity in check somehow.

On Linux with GCC and Clang you can use -fvisibility=internal to tell the compiler to not care at all about this and go wild with ABI. Of course it needs to be done carefully...
Compilers can clone functions (so there are two variants with different calling conventions) or even create alternate entry points.
There should be Ada in Linux kernel
and Haskell
C'mon people. Python all the way
The one true language: Threaded INTERCAL.

  PLEASE COME FROM HELL
PLEASE DO GIVE UP
and blockchain
a complete waste of time
If my boss told me to go write a Linux device driver in C++, I'd quietly go away and deliver a working device driver that happens to stick to the C subset of C++. Trying to fiddle about getting header files to include cleanly is a complete waste of time. (Maybe you're referring to something else like reading the article.) The benefits of C++ over C that is consistent and well-written in a disciplined manner is really not as great as many managers have been led to believe. And seeking forgiveness from an idiot manager is always easier than seeking permission to do things sanely.
I guess Rust is far better choise for that.
There is no reason that Rust doesn't share the same infrastructure problems. The main difference is that the kernel maintainers want Rust in the kernel, while if you want to maintain a module written in C++ then you are on your own.
Community wise, yes. It seems to have gained some momentum.

From a technical point of view, I'm not so sure. C++ still interoperate easier with C than Rust, if only because you can normally just include the headers and be done with it. (Although as the article says, there are some cleanup to do.)

From an everything point of view, there's no point in adding complexity for no tangible benefit. Rust has tangible benefits (memory safety). Very far from a silver bullet, but demonstrably better than C.

Rust isn't being experimented with in the kernel because someone decided we should really add a second language. C++ interop with C doesn't matter when there's no reason to use C++ in the kernel anyway.

The "interoperate easier" idea is a trap for both C and C++ and worth avoiding because in fact they aren't quite compatible, so you're making both languages worse to achieve this. I don't much like C++, but if you must use C++, actually use C++ and forget that it's sorta kinda "compatible" with C.
I know they are not 100% compatible, but they are 99% compatible, and that's much better than Rust.

It's easy to make it compatible by not using fields called "class", or using #ifndef __cplusplus, most C library headers are actually like that. But not the Linux kernel because they refuse it.

That's why I'm saying that the choice is not a technical one.

You can create headers that are usable from C and C++, but you have to actively maintain it that way. As C headers tend to not have function definitions in them, it's fairly easy to avoid the C-only features.

I doubt that Linux headers give a damn about usability from C++ though.