Hacker News new | ask | show | jobs
by haberman 4803 days ago
That is not what memory barriers are for, at all. Memory barriers are a sequencing primitive for shared-memory concurrency (an excellent intro is here: http://lxr.linux.no/linux/Documentation/memory-barriers.txt). They are never required for correctness in valid single-threaded programs.

The memory barrier "fixed" this program similarly to how a cruise missile "fixes" a termite problem. It was just a coincidence and it was the wrong tool for the job.

1 comments

Except we're talking about an invalid program. The program is invalid as written. Therefore memory barriers are the antidote because they're necessary in this situation.

A tool doesn't have a purpose. It has capabilities, and understanding why something works (and why it can be relied upon) is all that matters.

Yes, it is an invalid program. The antidote is to fix it, not to jigger it in a way that happens to work. The memory barrier is not "necessary" -- it is not even a correct fix. Even with a memory barrier as you added it, it is still an invalid program that invokes undefined behavior. The memory barrier may have coincidentally fixed the problem on your system, but there is still no guarantee it will work on another architecture, another compiler, or even another version of the same compiler.

The problem with my program is that it casts an int32_t pointer to int16_t pointer. The correct fix is to not do that. "Fixing" the problem with a memory barrier is a step in the wrong direction.

but there is still no guarantee it will work on another architecture, another compiler, or even another version of the same compiler.

My point is that it is guaranteed to work. A memory barrier guarantees that all memory operations before the barrier take effect before any operations after the barrier.

I think this whole exchange is fascinating because it illustrates two completely different philosophies to hacking. Both are equally valid. I tend to prefer yours because it tends to result in shorter programs. Yet this is just a programmer convention. The machines do not care.

Yet there are some instances where my philosophy -- understanding which rules may be safely ignored -- has paid off. For example, if your invalid program were in a closed-source library which I was forced to interface with, then the program can't simply be fixed. In that case, a memory barrier would probably be the cleanest workaround.

It's an unfortunate fact that this type of situation -- broken third-party code that can't be fixed and can't be replaced -- is quite common in the field. It seems like it's an important skill for an engineer to know how to handle such situations.

EDIT: By the way, Scrybe Music looks really cool!

It isn't guaranteed to work even with the memory barrier, because the undefined behavior is not merely an ordering problem. The problem is that merely accessing the object through the wrong kind of pointer breaks the rules and gives the compiler a license to do anything.

There is a time and place to break the rules, but it is a calculated risk. It can only be considered "safe" if you make assumptions about your environment (platform, toolchain, etc). You're vulnerable if any of those assumptions change. The things people considered "safe" 10 years ago aren't "safe" any more. But the people who followed the rules never have to change their approach.

For what it's worth, a cheaper barrier in this case (if you were going to take that route) is just a compiler barrier like __asm__ __volatile__ (""); (see: http://en.wikipedia.org/wiki/Memory_barrier#Out-of-order_exe...). There's no need to emit an actual CPU barrier.

Thanks about Scribe; it's a labour of love.

I think that your original suggestion - use memcpy is a better solution then volatile.

It looks like the following code would work:

  #include <stdio.h>
  #include <stdint.h>

  void f(volatile uint32_t *x, volatile uint16_t *y) {
    *x = 5;
    printf("%d\n", *y);
  }

  int main() {
    volatile uint32_t x = 10;
    f(&x, (volatile uint16_t*)&x);
  }
And the compiler is guarantied () to issue store op on x = 5 and consecutively load op on y, but the code is looking pretty ugly.

() assuming no alignment problems