| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mumblemumble 1816 days ago
	This problem isn't specific to global variables; it happens with all shared mutable state. I would assume that the author only used global variables because that lets them keep the working examples as short as possible, and minimize irrelevant details. And yes, you can put a full memory fence around every access to a variable that is shared across threads. But doing so would just destroy the performance of your program. Compared to using a register, accessing main memory typically takes something on the order of 100 times as long. Given that we're talking about concerns that are specific to a relatively low-level approach to parallelism, I think it's safe to assume that performance is the whole point, so that would be an unacceptable tradeoff.

1 comments

dragontamer 1816 days ago

> And yes, you can put a full memory fence around every access to a variable that is shared across threads. But doing so would just destroy the performance of your program. Given that we're talking about concerns that are specific to a relatively low-level approach to parallelism, I think it's safe to assume that performance is the whole point, so that would be an unacceptable tradeoff.

Indeed.

Just a reminder to everyone: your pthreads_mutex_lock() and pthreads_mutex_unlock() functions already contain the appropriate compiler / cache memory barriers in the correct locations.

This "Memory Model" discussion is only for people who want to build faster systems: for people searching for a "better spinlock", or for writing lock-free algorithms / lock-free data structures.

This is the stuff of cutting edge research right now: its a niche subject. Your typical programmer _SHOULD_ just stick a typical pthread_mutex_t onto an otherwise single-threaded data-structure and call it. Locks work. They're not "the best", but "the best" is constantly being researched / developed right now. I'm pretty sure that any new lockfree data-structure with decent performance is pretty much an instant PH.D thesis material.

-----------

Anyway, the reason why "single-threaded data-structure behind a mutex" works is because your data-structure still keeps all of its performance benefits (from sticking to L1 cache, or letting the compiler "manually cache" data to registers when appropriate), and then you only lose performance when associated with the lock() or unlock() calls (which will innately have memory barriers to publish the results)

That's 2 memory barriers (one barrier for lock() and one barrier for unlock()). The thing about lock-free algorithms is that they __might__ get you down to __1__ memory barrier per operation if you're a really, really good programmer. But its not exactly easy. (Or: they might still have 2 memory barriers but the lockfree aspect of "always forward progress" and/or deadlock free might be easier to prove)

Writing a low-performance but otherwise correct lock free algorithm isn't actually that hard. Writing a lock free algorithm that beats your typical mutex + data-structure however, is devilishly hard.

link

voidnullnil 1816 days ago

> This "Memory Model" discussion is only for people who want to build faster systems: for people searching for a "better spinlock", or for writing lock-free algorithms / lock-free data structures.

Actually, most practitioner code has bugs from their implicit assumptions that shared variable writes are visible or ordered the way they think they are.

link

dragontamer 1816 days ago

But the practitioner doesn't need to know the memory model (aside from "memory models are complicated").

To solve that problem, the practitioner only needs to know that "mutex.lock()" and "mutex.unlock()" orders reads/writes in a clearly defined manner. If the practitioner is wondering about the difference between load-acquire and load-relaxed, they've probably gone too deep.

link

voidnullnil 1816 days ago

> To solve that problem, the practitioner only needs to know that "mutex.lock()"

This is true, but they do not know that. If you do not give some kind of substantiation, they will shrug it off and go back to "nah this thing doesn't need a mutex", like with a polling variable (contrived example).

link

Kranar 1816 days ago

Can you explain what you mean by a "polling variable" needing a mutex? Usually polling is done using atomic instructions instead of a mutex. Are you referring to condition variables?

link

mumblemumble 1816 days ago

In a lot of code I've seen, there are threads polling some variable without using any sort of special guard. The assumption (based, I assume, on how you really could get away with this back in the days of single-core, single-CPU computers) is that you only need to worry about race conditions when writing to primitive variables, and that simply reading them is always safe.

link