Hacker News new | ask | show | jobs
by cominous 2962 days ago
1. Do you see the bloat even if you don't use post C++11 features but compile using the C++17 standard?

Yes, actually I tried using various C++ snippets and even reported that to the GCC compiler team. It happens with simple stuff like std::string and std::vector. The response was something like, that there really seems to be a bloat, but no performance impact and I guess most users outside of embedded don't care too much about the size of the compiled binary.

2. Do you think it is mostly the compiler that is causing the bloat alone? Or is it stuff from the standard library header files that some how gets linked in (and are not used or needed by your software)?

That's actually a very good question I cant give an answer to - meaning I haven't looked specifically into that.

As C++17 came to GCC I played with the compiler explorer and observed this by just switching gcc/clang version and -std flag. Actually, you can try it yourself: https://godbolt.org/

2 comments

Did you try with optimisations enabled? Here is a (trivial) program comparing gcc output between -std=c++14 and -std=c++17: https://godbolt.org/g/VLDhYf. Note that the output code size with -std=c++17 is significantly larger without optimisations (default), but it is identical with optimisations turned on!
I used -Os as for any embedded code where I don't care too much about the performance and rather need a compact binary.

I would love to post my code snippets from back then, but Im not home currently and the time Im back home I guess nobody will care about this anymore :D. Maybe I put it into an article.

If I had to guess, -Os prevents most inlining and in inlining is is pretty much required to remove most of the pure compile time abstraction and indirection that is used even on most trivial C++ libraries. Very likely libstdc++ make significant use of that.

The intermediate inline stages greatly expand the code size, until the point level where all the abstraction can be compiled away. I guess that -Os simply settles for a local optimum and gives up inlining early.

-Os is pretty much useless. For comparison, I compiled my (reasonably sized) project with -Os and got a 12MB statically linked binary. But with -O3, the binary size is only 4.2MB. Not only -O3 produces faster code but it's also smaller.
In modern systems with caches and a typical penalty for going out to RAM, is it still possible for larger code to be faster?
Yes. Caches actually work. And work quite well for code. Of course there are pathological cases.

Also, static size does not mean anything. The only thing that matter is the dynamic size (i.e. the instructions that are actually fetched at runtime: code that isn't run or run rarely doesn't matter (then again, such code is a prime candidate to be compiled with -Os).

To answer #2, you can try enabling LTO and see if that helps with binary size