Hacker News new | ask | show | jobs
by jcelerier 1659 days ago
> You see the problem. My C++ code expected the calling convention that pushed arguments on the stack,

that would be very weird on Linux. The x86_64 linux ABI mandates that the first arguments go on registers afaik (I'm assuming x86_64 here since the post mentions linux distros which are overwhelmingly x64). What compiler would default to a pure stack-based calling convention ? Certainly not GCC or clang, no ?

> and the kernel expected my code to pass arguments in the registers.

so, how is that a problem with C++ and not the compiler defaults ?

> I found the real gold mine of C++ kernel module development knowledge in OSDev.org. They have an entire article on C++ issues. That includes avoiding templates

bullshit it is then. https://www.youtube.com/watch?v=A_saS93Clgk

if templates are good (sometimes better even) on AVR microcontrollers with memory in kilobytes, there's no reason to not use them in a kernel meant to run on large embedded.

Also what's that rant about strings for ? In the end there is zero substance to this article, only very strange rants.

4 comments

> that would be very weird on Linux. The x86_64 linux ABI mandates that the first arguments go on registers afaik (I'm assuming x86_64 here since the post mentions linux distros which are overwhelmingly x64). What compiler would default to a pure stack-based calling convention ? Certainly not GCC or clang, no ?

The System V i386 ABI passes parameters through the stack. Perhaps that is what the author is referring to, although I wouldn't be surprised if he mixed it up with the x64 ABI.

In case it is talking about the x64 calling convention, there is actually some kind of an odd case where you would get something that looks like an argument was pushed on the stack:

When a non-trivially-copyable object is passed by value to a function, you need to ensure that through the lifetime of the copy, it's address will never change because the constructor may have stored address of some of the field (for instance, a pointer to a field). The way this is handled at list in the SystemV x84_64 ABI is that the object is created on the caller's stack, and a pointer to it is stored in a register, so just like if it was passed as a pointer.

I have seen several cases where a header would have an "ifdef c++" clause with copy constructors and destructors in them ("It does not add field so it should be OK, right ?"), which make the object non trivially-copyable, leading to clashing calling convention between C and C++ codes. I am curious about if this may be the issue he encountered.

Too late to edit so I'll just write a message here: The hypothesis written above would not explain the author's complains as he mentions he was able to fix it with a compilation flag, whereas I doubt the compiler would change non-trivially-copyable being handled as reference to a copy since it would break correctness.
The 32-bit Linux kernel uses a register-based ABI internally, rather than System V.
> The System V i386 ABI passes parameters through the stack.

No. The C/C++ ABI is quite uniform across architectures. The first 1..N (N is ISA dependant) parameters that can fit into a CPU register are passed via registers. The first input parameter that _can't_ fit into a register (e.g. a structure passed by value) is pushed onto the stack, with every other following parameter being pushed onto the stack as well. N+1… parameters are always passed through the stack.

> No. The C/C++ ABI is quite uniform across architectures. The first 1..N (N is ISA dependant) parameters that can fit into a CPU register are passed via registers.

Here is the "System V i386 ABI" mentioned above: https://refspecs.linuxfoundation.org/elf/abi386-4.pdf (from https://refspecs.linuxfoundation.org/). It clearly passes all arguments on the stack, and none on registers ("Function Calling Sequence" starting on page 35). That is the ABI used on 32-bit x86 Linux if you don't specify -mregparm (which the kernel uses); since the author was calling the compiler directly (which was necessary because the kernel makefiles only have rules for building C files, not C++ files), there was a mismatch between the -mregparm used by the kernel and the default ABI used by the C++ compiler, which was fixed by also passing -mregparm to the C++ compiler.

You are not incorrect, and I shall ruminate on why I had thought that the SysV ABI on i386 used %rax ÷ %rex as input function parameters without having to resort to the use -mpregparm. Thanks for the correction.
> The C/C++ ABI is quite uniform across architectures.

How can it be? What about an architecture without conventional registers? And for example I work on an implementation of C/C++ that logically uses the heap for its ABI.

Quite uniform != completely uniform.

Especially when it comes to C (less so C++), it is a remarkably adaptable language that has been able to attune to a variety of vastly incompatible hardware architectures, including stack based ones, heap based ones and some esoteric ones as well. Yet, in the case of conventional, register based ISA's, the ABI has been remarkably similar: nonwithstanding actual ISA specific register names, registers 0…N (apart from RISC ISA's where storing into/loading from the register 0 is a no-op / zero constant) are used as input parameters and register 0 (where available) is used as the function return value (provided it can fit in); otherwise the return result is returned via stack.

> Quite uniform != completely uniform

Don't know if you're a non-native speaker, but no 'quite' usually does means 'completely'!

https://dictionary.cambridge.org/dictionary/english/quite

Only when used with non-gradable adjectives/adverbs (from the same surce: https://dictionary.cambridge.org/grammar/british-grammar/qui... ) (and yes, uniform, is quite non-gradable)

(non-native speaker here, quite frustrated about the quite different meanings of 'quite')

> so, how is that a problem with C++ and not [...]

I don't think you should see this article as a criticism of C++. Just a rent on how hard it is to use in the Linux kernel which is openly against it.

Perhaps the article is old enough to have been written in the 32-bit era?
The Page Info I see says this:

> article:published-time 2016-10-28T11:40:06+00:00

which is well into the era of 64-bit code.

People are still complaining about macOS dropping 32bit despite the last 32bit hardware having been dropped a decade ago.

Some people (especially game devs) are bizarrely obsessed with 32bit :-/

The LDD3 mentioned is 32-bit era, and 2.6.x kernel, which had a CONFIG_REGPARM to allow passing parameters in registers (because the default was not to do that).
AVRs don't have enough storage for templated code to explode into an unmanageable problem.
Somehow C64 can deal with them.

"CppCon 2016: Jason Turner “Rich Code for Tiny Computers: A Simple Commodore 64 Game in C++17”"

https://www.youtube.com/watch?v=zBkNBP00wJE

"C++20 For The Commodore 64"

https://www.youtube.com/watch?v=EIKAqcLxtT0

I love these videos, Jason is doing fantastic work with his YouTube channel and C++ Weekly series.
Check his CppCon 2021 presentation, done on a C64 emulator thanks constexpr. :)
Thanks, I will :)
But a C64 is still only talking about 64kB of RAM. The post you're replying to claims that Template complexity gets unreasonable on a large machine, such as a Linux system. Not sure I agree, but "it's fine on a C64" isn't evidence in your favour unless you've forgotten Linux doesn't even run on a Commodore 64.
So it is fine on a 64KB system, but unmanageable on a platform that gets all the different kinds of boilerplate to run cloud workloads, got it
C++ or even C isn't exactly "fine" on any 8-bit system though. It's nice for a little demo, it can even be tolerable for some real-world projects when mixed with large amounts of inline assembly, but those 8-bit ISAs have been designed mainly for manual assembly coding, not high level compiled languages like C.
Honestly unless you’re on something like an ATtiny with < 1K of RAM or doing cycle-counted stuff a properly adapted high-level language is fine. I mean, Forth (doesn’t have to but usually) uses an interpreted virtual machine and people have used and liked it on 6502s since those were the new hotness.

As far as I’ve seen, two things make C and C++ specifically problematic on 8-bitters: automatic promotion to int for all expressions, with int required to be at least 16 bits (a language problem); and subpar codegen on accumulator architectures and other things that are not like desktops (a compiler problem).

C translates directly to ASM in many cases. It just makes managing offsets and other stuff easier.

C++ adds type-safety on top of that for no cost. It's great when your compiler tells you that there is no operator =|(PORTD, PINA). Did you mean |=(PORTD,PIND) or =|(PORTA,PINA).

I mean, apparently this is confusing, but yes, obviously.

If your Commodore 64 template is dealing with say, foozles that might be 8-byte, 12-byte or 16-byte, the complexity incurred is pretty small, bugs with foozle<16> are likely to be something mere mortals can understand and fix.

On a more complicated system like a cloud Linux setup the template may be for foozles that can be in any colorspace and on a remote machine or locally, and now sometimes the bug with foozle<HSV,remove> involves a fifty line error message because the compiler doesn't realise all that happened is you meant to write foozle<HSV,remote> ...

It's not even as if the C++ committee isn't aware that templates are a problem. Remember template meta-programming wasn't really intended from the outset, and a big part of the point of Concepts was to at last let you write code that compilers can provide readable errors for when it's wrong.

C++ templates are unwound at the compile time before the «expanded» template code passes along into the optimiser where most of the unused code is elided.

Unless the templates have been externalised (i.e. defined as «extern template …», of course). Even then, a modern compiler+linker combo will optimise most of the unused code away at the linking time thus reducing the final binary size. I do understand that the LTO might not be available for every embedded platform, though.

P.S. That is exactly the point of the C++ template metaprogramming – the hard lifting is delegated to the compiler, which leads to increased compile times but also to more efficient and very compact runtime code.

What do templates have to do with storage, though? My primary attraction to C++ templates is that they let me write very expressive code that will compile down to a handful of instructions. Now, actually compiling complex C++ templates on a storage-constrained system can be a problem, since templates are compile-time beasts, not runtime. Once compiled, though, they have a Cheshire-cat existence.

Edit: Unless you're doing something rather silly with the templates, but again, that's not a template problem.

The general complaint with templates is they are instantiated and if you’re not careful can bloat the binary with multiple versions of a piece of code. But this is usually something pretty easy to solve: just don’t do something that would cause that to happen :P
I have _never_ had any issue with C++ templates on _modern_ µControllers such as the ESP32. Unless you have an incredibly minuscule flash, modern GCC or LLVM are very good at deleting unused code when you compile everything with -Os. Even -Og isn't that critical either.