|
|
|
|
|
by davidtgoldblatt
3657 days ago
|
|
I assume interrupts would be disabled or deferred during this section, so you get stronger guarantees on worst-case behavior; hence the bound on the number of cycles a thread is allowed to execute for. On one of the less common unixes (Sun's maybe? I don't totally remember), you could set a bit in some thread-specific part of memory to indicate that you were uninterruptible. If the OS saw that bit was set, it would set a "you need to yield" bit in the same part of memory and then allow the thread to continue (and penalize threads that abused the feature). As a practical matter, I suspect that the scheme proposed wouldn't have advantages over a careful use of HTM. I also suspect that the "lock two cachelines" thing is harder to implement than it sounds. The way Intel implements atomic operations that span cachelines now involves waiting for a interconnect quiescence period (taking O(microseconds)). What's proposed here is strictly more general. |
|
As a practical matter, HTM is dead. It's 2x slower than locking in the common cases: uncontended critical section or contended critical section that has a race. It also prevents you from doing effects outside of memory (it's just guaranteed to revert to locks in that case, so you pay all of the overhead of HTM and all of the overhead of locks).
Interesting about the difficulty of locking two cache lines. To me the important thing to keep in mind when talking about concurrency alternatives to the status quo is: nobody has proved that contended-but-not-racy critical sections are common. According to my data, they are unicorns. This means that from a performance standpoint, any proposed concurrency protocol that is different from locking must justify how it plans to beat locks on the two things that they are super good at, which are uncontended critical sections (with locks you pay one or two CASes on cache lines you probably already own) and contended-and-racy critical sections (with locks you pay for some CASes but the lock will make threads contend as quietly as possible to allow the lock owner to proceed quickly). If a proposal is worse than locks on the two most common kinds of critical sections, then that's silly. AFAICT, all of these trasactions-or-better-CAS approaches would only help in the case of contended-but-not-racy critical sections, which are too rare to matter. The two cache line thing definitely sounds like it will be more expensive than locking in the uncontended case, and I don't see how it will help in the contended-but-racy case.
EDIT: I said "contended-but-racy" when I meant "contended-but-not-racy" in one place, above. Fixed.