Hacker News new | ask | show | jobs
by BeeOnRope 2404 days ago
The title was basically clickbait.

It reflects how I encountered the issue, but the investigation doesn't have much of that, because I was starting from a point where I suddenly had a slow and fast algorithm (the actual scenario was more complicated than shown).

The investigation doesn't have much to do with clang-format because I'm just trying to see why a raw loop is faster, and ultimately has nothing to do with "raw" vs "std" at all, really. Only when I understood that includes and include order matter was I able to map it back to a header order swap made by clang-format.

> This part lost me because I can't see how you can use std::transform (which is in <algorithm>) without hitting this flaw.

No, two ways.

You can get the fast performance with std::transform if you happen to include <ctype.h> before <algorithm>. The order matters.

Similarly, you can get the slow performance with a raw loop if you happen to include <algorithm> before <ctype.h> in the file for the raw loop.

In fact, I'd say that the raw loop and std::transform will have the same performance almost all the time. If they are in the same file, they will have the same performance. If they are in a C++ file, you are almost certainly including some C++ header that triggers the issue. Only in special cases, like a coding convention that separates C and C++ headers with whitespace with C first (clang-format doesn't sort headers separated by whitespace) are you likely to come into it "natively".

Even if you do, it's absolutely no fault of std::transform or <algorithm> - it's a weird effect of interaction between C, C++ and OS headers, nothing inherent to the C++ algorithms.

2 comments

I bet the performance would look different in a real program anyways. The random input data probably results in more branch predict misses than would occur in normal text, and the tight loop of the benchmark basically ensures the lookup table is always in cache.
Definitely, but I think it's unlikely that it will close the gap much between the fast and slow std::toupper() compilations (my own toupper is mostly just in there as an interesting reference point).

In addition to more instructions and a function call, the slow version has many more memory dereferences, so if everything is very cold it is likely to suffer more misses.

A disagree with your conclusion that the slowness doesn't matter. If it doesn't matter to anybody then why does the optimized version even exist?

IMO this is a library bug. <algorithm> shouldn't be limping toupper or anything else.

The conclusion near the end of the blog post?

That's not really because performance doesn't matter, but because toupper() shouldn't really matter in most modern C and C++ programs: it just can't support Unicode. So either you are using a different way of supporting strings entirely (if you need any kind of Unicode or multibyte support), or this is some internal ASCII-only text processing in which case you are better off using your own methods (e.g., the lookup table I mention at the end) since they'll provide an easy speedup and you aren't accidentally introducing locale-dependent code where you don't want it.