Hacker News new | ask | show | jobs
by cominous 2964 days ago
C++17 is definitely going in the right direction for most applications. But I have the feeling, that the compiler implementations cannot catch up with the modernization speed.

We are using C++ for embedded devices and recognize a steady code bloat with every release since C++11 (especially with C++17) without using any of the new features (with gcc/clang). This is a trust-killer and actually the reason we stay on C++11 for embedded development.

5 comments

I would be really interested if you can shed a little light on the following related questions:

1. Do you see the bloat even if you don't use post C++11 features but compile using the C++17 standard?

2. Do you think it is mostly the compiler that is causing the bloat alone? Or is it stuff from the standard library header files that some how gets linked in (and are not used or needed by your software)?

1. Do you see the bloat even if you don't use post C++11 features but compile using the C++17 standard?

Yes, actually I tried using various C++ snippets and even reported that to the GCC compiler team. It happens with simple stuff like std::string and std::vector. The response was something like, that there really seems to be a bloat, but no performance impact and I guess most users outside of embedded don't care too much about the size of the compiled binary.

2. Do you think it is mostly the compiler that is causing the bloat alone? Or is it stuff from the standard library header files that some how gets linked in (and are not used or needed by your software)?

That's actually a very good question I cant give an answer to - meaning I haven't looked specifically into that.

As C++17 came to GCC I played with the compiler explorer and observed this by just switching gcc/clang version and -std flag. Actually, you can try it yourself: https://godbolt.org/

Did you try with optimisations enabled? Here is a (trivial) program comparing gcc output between -std=c++14 and -std=c++17: https://godbolt.org/g/VLDhYf. Note that the output code size with -std=c++17 is significantly larger without optimisations (default), but it is identical with optimisations turned on!
I used -Os as for any embedded code where I don't care too much about the performance and rather need a compact binary.

I would love to post my code snippets from back then, but Im not home currently and the time Im back home I guess nobody will care about this anymore :D. Maybe I put it into an article.

If I had to guess, -Os prevents most inlining and in inlining is is pretty much required to remove most of the pure compile time abstraction and indirection that is used even on most trivial C++ libraries. Very likely libstdc++ make significant use of that.

The intermediate inline stages greatly expand the code size, until the point level where all the abstraction can be compiled away. I guess that -Os simply settles for a local optimum and gives up inlining early.

-Os is pretty much useless. For comparison, I compiled my (reasonably sized) project with -Os and got a 12MB statically linked binary. But with -O3, the binary size is only 4.2MB. Not only -O3 produces faster code but it's also smaller.
In modern systems with caches and a typical penalty for going out to RAM, is it still possible for larger code to be faster?
To answer #2, you can try enabling LTO and see if that helps with binary size
I haven't tried c++17 year, but c++98->c++11 did bring about code bloat. However even though we were building the same code base for two different systems, one without a C++11 compiler (thus we could not use c++11 features): it is incorrect to say we were not using C++11.

Just turning on C++11 in the compiler brings move to all standard library containers. The header files were not just "somehow" linked in and not used, the additional code was used in many places behind the scene. I haven't done benchmarks on my code, but the general report of those who have is that in exchange for the extra binary size the code runtime was 5% faster. For most people these days that expense is worth it.

What do you mean by code bloat?
Binary size after compiled. Also time to build - just turning on C++11 more than doubles the time to run the processor. (I hope C++20 modules fixes this)
Just curious: why did you choose C++ instead of C for embedded? Most shops I know chose C just because of code bloat.
I have to admit, that C++ is still not the industry "Go-To" language for embedded. But if you apply modern C++ correctly, there is very few overhead compared to C and the software is much easier to maintain.

The performance of embedded MCU's are continuously rising over the years and that little overhead is buying development speed.

Not to mention smart pointers, templates and constexpr making my life easier.

The only real issue with C++ is, that as soon as you get into serious embedded applications, you have restrictions when it comes to heap usage e.g. in medical devices using the heap is forbidden. So you cant use the STL.

There is a promising embedded STL project, but it's not there yet: https://www.etlcpp.com

Shameless plug: At EA we've open-sourced our C++ standard library implementation that focuses on games. While it's not necessarily focused on embedded development, you might be interested in checking it out - it provides a number of fixed_* containers (fixed_vector, fixed_set, etc) which can be configured to use stack allocations only, among some other things you might find interesting :)

https://github.com/electronicarts/EASTL

(In case it wasn't obvious, disclaimer: I work at EA, and frequently do work on EASTL, though I'm not the primary maintainer.)

> So you cant use the STL.

std::array is in there since C++ 11 and it doesn’t use heap.

And/or you can use STL with custom allocators that work without heap. We did something similar developing for Nintendo Wii console. There was a heap but we didn’t want to use it to avoid memory fragmentation. AFAIR we used two arenas (essentially stacks), one very small for temporary data cleaned up at the start of each frame, and a large one cleaned up when a level is unloaded.

However, I don’t have hands on experience developing firmware for medical devices, so I’m not sure it’ll work for them.

The problem for medical devices isn't so much whether custom allocators will work. The problem is whether the FDA will freak out because you're not following industry-best-practice coding guidelines.
That's really not how it works. The FDA is fundamentally concerned about two things, safety and efficacy. You need a plan to demonstrate the latter, and you need your quality system, SDP, etc. to demonstrate how you approach the former. This is about good engineering practices, not particular implementation techniques.

So you can do things many different ways. If you do say "we do this like X, which is industry standard, just like T, U, and V do" it's a simpler argument than "we do this like Y. Lots of people do X, but here is how we have demonstrated Y is better for us...". But this can be fine too, just possibly more work.

Also worth noting (a) there is no industry-wide best practices agreement (b) there is no FDA wide agreement on what should be done (different device types are reviewed by different panels (c) the FDA doesn't understand software development deeply across it's panels, but it is catching up.

Sure, that's all true. I'm looking at the "simpler argument" part.

It's especially true if you're saying "This new device is just like our previous device, with these few small changes". (I forget what that's called, but you can do a lot less paperwork if that's true.) But if you start doing memory allocations where you never did before, they're probably going to want to apply higher scrutiny to your entire software. That's... painful.

Arenas/Pools/Planks have been industry best-practice for my entire career.
Also, STL is data structures + algorithms. Even if you can't use the datastructures out of the box you might be able to use the algos. Boost.Intrusive provides STL compatibe datastructures with full control of allocation (and more).
Out of curiosity, what is the rationale for not using the heap with medical devices? Resource constraints are one thing but that is not limited to medical nor is that entirely solved with preventing heap use. If it's for runtime safety to avoid raw pointers, has anyone done an analysis to determine if smart pointers (unique_ptr, shared_ptr), combined with diligent static code analysis diagnostics to avoid the kinds of issues Herb raises in the OP video, could reduce the risk to an acceptable level?
I have not worked in medical (my experience is in games) but my best guess is its more about reliability and predictability. Using a heap suffers from the fact that you can run out of memory to satisfy a malloc/new request (either due to system memory limit or due to fragmentation).

With static memory techniques you can "prove" the system has enough memory to work in all modes, i.e. device consumes 20 readings per second, keeps them in a ring buffer backed by a static fixed array, that buffer is large enough to satisfy processing rate.

   Out of curiosity, what is the rationale for not using the heap with medical devices?
Avoiding heap allocation is not at all a general constraint for medical devices. For certain types of components (think safety-critical real-time sub systems, for example) they are going to be very interested in your hazard analysis and the mitigating approaches to possible issues.

So if there is a way to say: we don't have to worry about [class of error X] because we don't ever do Y, that's a straightforward way to sort out those components. If you have a compelling tech reason to do Y, better start thinking about all the controls you'll put on it.

Think about it this way: What's the worse thing that can happen if your code causes an OOM error? If the answer includes things like "somebody dies if it happens at the wrong time", you'll want to be really careful to prove (prove, not just test out) that can't happen.

Safety critical software (or even mission critical software) should not be using dynamic allocation for a few reasons.

- Fragmentation.

- Non-deterministic runtime (in the real-time cases).

- Insufficient analysis of worst-case conditions (i.e. you haven't worked out what your worst case RAM usage is, otherwise you would have statically allocated it).

IMO, the worst is the final case as it shows a lack of thoroughness in the design as a whole and brings the rest of the code into suspicion. Fragmentation can be worked around, but not the others.

When I was using C++ for embedded, which is admittedly over 15 years ago, I had to switch off RTTI and exception handling to get compact binaries. Basically just using classes and surface language features. We did use templates but only very selectively.

Is it still possible today to pare down the compiler output like that? I imagine a lot of modern C++ just doesn't work unless everything is enabled.

Yes it is possible.

Check this talk about fitting C++17 on a C64.

CppCon 2016: Jason Turner “Rich Code for Tiny Computers: A Simple Commodore 64 Game in C++17”

https://www.youtube.com/watch?v=zBkNBP00wJE

Also be aware that embedded devices like Arduino and Cortex-M (with Mbed OS) do use C++ toolchains.

Modern C++ compilers do pretty well on a Commodore 64, let alone many of the typical embedded deployments outside pico-controllers.

CppCon 2016: Jason Turner “Rich Code for Tiny Computers: A Simple Commodore 64 Game in C++17”

https://www.youtube.com/watch?v=zBkNBP00wJE

Most of the time is either religion against C++ or lack of modern tooling, given that most embedded toolchains are stuck with C90 and C++98.

> religion against C++

possibly. but given the how often i've seen c++ users treat c users like idiot savages or heathens that need conversion ("have you heard the good word of our lord and savior, c++?"), i could understand a negative sentiment.

Maybe if C developers wouldn't be ignoring Lint since 1979, and better type systems, we would be having better conversations.

"Although the first edition of K&R described most of the rules that brought C's type structure to its present form, many programs written in the older, more relaxed style persisted, and so did compilers that tolerated it. To encourage people to pay more attention to the official language rules, to detect legal but suspicious constructions, and to help find interface mismatches undetectable with simple mechanisms for separate compilation, Steve Johnson adapted his pcc compiler to produce lint [Johnson 79b], which scanned a set of files and remarked on dubious constructions."

Dennis M. Ritchie -- https://www.bell-labs.com/usr/dmr/www/chist.html

I would like the stack underlying my computing needs not to look like a Swiss cheese.

And yes, C++ is also not the ultimate solution for that as it is tainted by its C compatibility.

Probably because it is much safer, faster to develop and easier to maintain.
If a new feature does not play well with an entrenched object hierarchy, one can as well do a rewrite. This has happened at a shop where I worked and the rewrite was in C, which turned out to have a faster development time and the result was more flexible.

Now, I'm aware that you can restrict yourself to "almost C" in C++, but no one ever seems to be doing that. A litte str::string here, a tiny std::vector there, so exceptions are already in, so why not go all the way.

You can write modern C++ with no overhead on a system with just 16kB of scratchpad memory. It is much nicer to use than C (namespaces, auto, templates and lambdas alone).
And RAII! That's so nice in an RTOS. Never again forget to drop priorities or reenable interrupts just because you went down a not as well trodden failure path. You can even return the lock_guard, and the caller can continue to do work in the same atomically locked context, but if they choose not to, they just drop the return value and everything works as expected.
If there is a modern C++ compiler available for a platform, it is almost irresponsible to use straight C. That doesn't mean that everything needs to be pure idiomatic C++17, but destructors, templates, iteration, operator overloading and move semantics are still indispensable in non trivial programs.

Avoiding bloat and performance hits are trivial in comparison to structural and architectural benefits in the modern language.

You're overstating a bit but there's a kernel of truth. However, it depends on your viewpoint. If you have a functional codebase in C and size is an important factor you may be reluctant to trade up. Bloat avoidance is not trivial, in my opinion. Performance is probably less of a factor, most of the additional functionality generated by the C++ compiler has to be written by the C coder in the end.
> If you have a functional codebase in C

Rewriting something that already works would be silly of course.

> and size is an important factor you may be reluctant to trade up. Bloat avoidance is not trivial, in my opinion

I don't agree with this in comparison to C. C is still there, but you have destructors and ownership semantics on top of it. Even templates can be used for things like type checking in debug builds.

Good that you can specify -std=c++11 to prevent the bloat. The more recent 14 and 17 standards don't seem as groundbreaking and practical (both at the same time) as 11 was and, as you note, do seem to bring bloat.
I use C++17 in my toy code (which is for a mixture of fun and learning modern C++, so...) and the main reasons (besides to learn) I use C++17 over C++11 is std::variant and (from C++14) make_unique. C++11 has a lot of compelling stuff but 14 and 17 seem to be relatively minor improvements over 11. It’s a pity they introduce bloat... definitely makes them much less compelling for use cases where it matters.

Nested namespace definitions and structured binding declarations are also nice.

Which features do you see causing problems? If constexpr has been fantastic, and outside of that, I mostly benefit from broader metaprogramming power.
Are you pulling in the standard library, or is this in just normal code?