| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by chisophugis 4194 days ago

What Bjarne doesn't mention is the enormous difference in code size between qsort and std::sort. The flexibility of having the compiler generate a sorting routine from std::sort is convenient but enormously redundant in many cases. In LLVM, we have array_pod_sort which is just a thin wrapper around qsort in order to avoid the code bloat of std::sort: http://llvm.org/docs/doxygen/html/namespacellvm.html#ae5788f...

For example, the following generates about 2KB of instructions (and will for basically every new type you want to sort):

#include <algorithm>

struct SomeStruct { int X; };

void foo(SomeStruct *SS, int NSS) { std::sort(SS, SS + NSS, [](SomeStruct LHS, SomeStruct RHS) { return LHS.X > RHS.X; }); }

A qsort equivalent will only emit code for the comparator which is just a handful of instructions.

C++ templates may be type safe and all, but at the end of the day they spew duplicated code just as much as those header-only macro-based C containers and algorithms; really more because it's less painful to write templates (vs. macros) and so you do it more, and there is more stuff in the templates. So even though in general the specialized generated code might be faster in most cases (as Bjarne likes to tout), the overall hit on your code size (and i-cache) can be dreadful. Currently, avoiding this issue in C++ just requires diligence on the part of the coder (some optimizations like LLVM's mergefunc can help, but in general it is a pretty hard problem and compilers are not Sufficiently Smart (TM) yet).

4 comments

btmorex 4194 days ago

As long as std::sort can fit in the instruction cache, who cares? (outside the embedded world obviously) It will always be faster unless you're getting regular instruction cache misses.

link

makomk 4194 days ago

Even if each instance of std::sort fits in the instruction cache, it's helping to push some other code elsewhere in the program out of the instruction cache, slowing that code down in the process.

link

humanrebar 4194 days ago

If std::sort pushes something out of icache, it's to make room for inlining swaps and compares. I'm skeptical that the other things bumped from icache should be kept hotter.

link

comex 4194 days ago

In addition to the "unless you sort 10 elements" factor, as stated in a sibling comment -

LLVM + libclang together are already 46MB of code, and they're libraries designed to be statically linked. Sure, they aren't going to fill your drive, but they will take a comparative while to load from disk (especially as part of another application - doesn't matter much on the command line), and it's that much harder to justify including them in an otherwise small downloaded package.

link

ant6n 4194 days ago

Unless you sort 10 elements

link

rasz_pl 4194 days ago

Yes, and if it doesnt just wait for better CPU, right? You work for microsoft?

link

humanrebar 4194 days ago

If it doesn't, use qsort. C++ lets you do that.

link

gsg 4194 days ago

Yep. I remember seeing a presentation by one of the graphics engine shops that indicates they did the same thing. (I'll see if I can dig it up...)

Edit: it was DICE, see http://www.slideshare.net/DICEStudio/executable-bloat-how-it... - skip to page 14.

link

humanrebar 4194 days ago

Your concern is valid, especially since it includes measurements on a real project. But I'm not sure it's a major concern. One can (and you already do) use qsort when binary sizes become a concern.

This is an interesting optimization, but it's not suitable for a beginner-to-intermediate C++ audience, which is who Stroupstrup is addressing. The sensible default is to use std::sort.

If someone can (and it looks like you have) measure a benefit in doing something special, then she should have at it. If an entire project, again with measurements, can prove that std::sort shouldn't be its default sort algorithm, that's fine too.

link

infogulch 4194 days ago

Does C++ not deduplicate templates that work on the same size types? E.g. if I used std::sort on arrays of `int`, `unsigned`, `float`, and half a dozen structs of exactly 4 bytes, I don't see why there would need to be more than one copy of the template in the final binary.

link