Hacker News new | ask | show | jobs
by jcranmer 2401 days ago
It's not what DEC Alpha did, but what it didn't do. ;-)

The issue that comes up on Alpha is this code:

  thread1() {
    x = …; // Store to *p
    release_barrier(); // guarantee global happens-before
    p = &x; // ... and now store the p value.
  }
  
  thread2() {
    r1 = p; // If this reads &x from thread1,
    r2 = *r1; // this doesn't have to read the value of x!
  }
The Alpha's approach to memory was to impose absolutely no constraints on memory unless you asked it to. And each CPU had two cache banks, which means that from the hardware perspective, you can think of it as having two threads reading from memory, each performing their own coherency logic. So you can have one cache bank reading the value of p who, having processed all pending cache traffic, saw both the stores, and then you can turn around and request the other cache bank to read *p who, being behind on the traffic, hasn't seen either store yet.

Architectures with only one cache bank don't have this problem. Other architectures with cache banks feel obligated to solve the issues by adding extra hardware to make sure that the second cache bank has processed sufficient cache coherency traffic to not be behind the first one if there's a dependency chain crossing cache banks.

1 comments

So on every other platform than alpha thread2 works correctly without any barrier?

Does this mean when you use double-checked locking on p on non-alpha systems, you do not need any kind of synchronization on the fast path where p is initialized?

So this would be correct?

    if (!p) {
      T *x = new T;
      release_barrier();
      if (!compare_and_swap(p, 0, x))
        delete x;
    }