I've been wondering about the impact of insturction cache misses and potential downsides of executable bloat caused by C++ templates, but have never seen a real case of this being a problem.
In 2010 Endeca's analytics engine exe was growing substantially via introduction of template instantiations. The engine eventually crossed the threshold of instruction cache misses outweighing the benefits of avoiding type lookups at runtime, and the team agreed to stop pursuing template instantiation so aggressively as a performance enhancement.
I wonder if modern compilers would have allowed the team to keep at it.
It's been a while, but if I recall correctly, the optimized, stripped exe's at the time were a bit under 100mb. >500mb with debug symbols, etc.
Executable bloat can be avoided by stripping the executable afterwards, if that's what you mean: Although the function body will have been provided if instantiated and used, the compiler will have probably inlined a huge amount of any code that is actually used - a lot of templated stuff is made up of one liners or single uses, which compilers are very very good about reasoning through the motion of these days.
I guess a naive compiler could have issues but every time I check something on compiler explorer, the modern big boys (GCC, LLVM, ICC?) are pretty shrewd (Especially if you optimise for size).
While aggressive inlining eliminates function call overhead, it can actually exacerbate instruction cache misses because it makes the executable larger.
I don't understand where does this sentiment come from. I've once seen a complete high-level networking algorithm implemented with boost.asio be reduced to less than 250 instructions...
Modern C++ compilers are pretty good at minimizing template bloat. As with most things in C++ it requires some awareness and intent to optimize this but it isn't the issue it used to be in many cases.
perf-tools are my favorite. The overhead is negligible, and thus any metrics you gather are very accurate. Valgrind is rarely useful, considering the execution time disadvantage.
I wonder if modern compilers would have allowed the team to keep at it.
It's been a while, but if I recall correctly, the optimized, stripped exe's at the time were a bit under 100mb. >500mb with debug symbols, etc.