Hacker News new | ask | show | jobs
by sjolsen 3695 days ago
>It seems keywords are added to ensure serialization/atomicity, then compilers find a way to optimize it away/make it useless

No, what happens is keywords (or more broadly, semantics) are added which provide certain guarantees given certain preconditions. Then, developers write code that depends on those guarantees and fails to meet the preconditions, but happens to work anyway. At some point, the implementation adds some optimization which -- while still providing the appropriate guarantees when the preconditions are met -- breaks that code.

I know C and C++ can be a royal pain in the ass sometimes, but I have little sympathy for developers who flip the "I know what I'm doing, please assume my code is right and make it run fast" switch, then get upset with the compiler when it turns out that their code actually _isn't_ right and the compiler performs an optimization they weren't expecting. If you don't want the compiler to reorder your memory access, then don't fuck around with the memory_order flags.

2 comments

On that particular point of memory orders, this is true.

But on other points about how compiler optimize the simple alternative (-O0) is the binary to be painfully slow, while you still have a dozen of flag to make the binary both fast and safe, but the safe flags are not activated by default. The unsafe mode is activated by default when you compile with -O2. How is this useful? It is just crazy.

> Then, developers write code that depends on those guarantees and fails to meet the preconditions, but happens to work anyway.

This reminds me of the memcpy/memmove issue. You should use "move" when areas overlap, not memcpy

The real question is: why are there two versions? Checking if the memory overlap is very cheap compared to copying, let's say 16 elements. And if you're memcpy'ing smaller sizes you can probably do it manually

> I know C and C++ can be a royal pain in the ass sometime

Yes, they lack enough information to know what you are actually trying to do and have to guess a lot of things. Not sure how "C++ smart" are modern compilers. (Example: you pass an object by value, and only read one field, can it optimize this?)

> Example: you pass an object by value, and only read one field, can it optimize this?

Yes.

    struct S { int a,b; };
    void foo(int); // Some external function
    void bar(S s) { foo(s.a); }
    void baz() { bar(S{42, 55}); }
compiles the baz function to (-O2 on GCC 4.9)

    movl    $42, %edi
    jmp     _Z3fooi
Copying short non-overlapping chunks of memory is common in a lot of workloads (think manipulation of short strings) and it does happen in tight loops where an overlap check with two additions, two comparisons and two branches is a comparable amount of work to what memcpy() does.

Why should I have to manually implement memcpy() for small sizes when I know the memory won't overlap?

I see they've finally changed the memcpy() specification to forbid overlapping memory areas completely.

Previously, it was defined to copy from low addresses to high addresses, which meant you could use this to fill an array:

    p[0] = 42;
    memcpy(&p[1], &p[0], sizeof(p)-sizeof(*p));
And people did.
The original 1989 ANSI C specification stated, in "4.11.2.1 The memcpy function":

If copying takes place between objects that overlap, the behavior is undefined.

So it has been this way in C since the first official standard. I don't have a first edition K&R so I can't see what that said, though.

Huh. You're quite right.

Well, it may have been undefined, but in practice the behaviour was standardised and couldn't be changed without breaking existing code... which is basically C in a nutshell.

Code like your example would have been broken by real standard library implementations very early in the piece, because even a forward-copying implementation that does word-at-a-time copies (a straightforward and obvious optimisation with real and significant benefits on many machines of the era) would have broken it.
> The real question is: why are there two versions?

Because long long ago, when CPU was slow, and compiler and stdlib were stupid, there were two versions.