Hacker News new | ask | show | jobs
by loeg 1457 days ago
Minute -- you're often going to be better off eliminating the Arc/Mutex anyway. A (very) small but concrete win for some workloads.
1 comments

> you're often going to be better off eliminating the Arc/Mutex anyway

Not always. Mutexes can be really fast (10-20ns), especially since they often optimistically spin, and Arc in Rust is (often) relatively low cost since you can hand out "free" refs without touching the atomic.

If removing the Arc/Mutex would require allocations the Arc/Mutex could easily be faster.

> > often

> Not always

Yeah, that's what "often" means.

> Mutexes can be really fast (10-20ns)

Notably, still worse than 0 ns. Ditto for Arc's refcounting and additional allocation. I'm not saying go on a crusade against Arc+Mutex here, but the easiest way to make effective use of modern multicore CPUS is to go to shared-nothing, independent data-per-thread designs (obviating Arc+Mutex). And if you aren't using Arc+Mutex, it's harder to accidentally share mutable state between threads.

I just think people seriously overestimate the cost of a mutex when implemented efficiently. Unlocking a mutex can be ~10-20x faster than fetching a value from main memory, or just a bit slower than a few integer operations. The way people talk about mutex operations you'd think that it's akin to hitting disk when it's actually a few orders of magnitude closer to hitting your L2 cache.
It gets a lot more expensive if you’re actually contending the mutex between threads; and if you’re not, why use a mutex? I agree the uncontended case is fast — it’s just not very useful.
There are a lot of scenarios where you're rarely contended but you cannot rule it out, so for correctness reasons you should use mutual exclusion but your measured performance in the real world essentially never cares about the contended case.

Modern fast mutexes are perfect for that, because their uncontended case is so good. This also inculcates the correct choice for the programmer, you should prefer to write code that is less often contended, not fight hard to get better contended performance at a cost of worse uncontended performance. Contention is bad even if your mutual exclusion primitive performs well.

But Mara measured across simulated workloads with varying contention and this fix improves them all to different extents.

> and if you’re not, why use a mutex?

Because it's an incredibly efficient, safe option for doing so. Lots of shared state is rarely contended. For example, imagine you have a 'Config' that gets updated periodically in the background, readers of that config only check for updates every 1 second, and you have 7 parallel readers (and 1 writer for an 8 core system).

A Mutex is a trivial way to solve that problem that will be extremely efficient.

Don't atomic operations trigger cache synchronisation in CPUs? Doesn't that affect performance negatively? That would mean even a non-contended mutex would affect performance negatively. I suspect it depends a lot on the specific workload (and maybe even what addresses data is stored at in memory), so I'd measure the specific case, but that's my a priori gut feeling.
If it's non-contended, the mutex's cache line probably stays Exclusive in the local CPU and acquiring is pretty cheap.
If there's no contention there's no more synchronization compared to any other cache lines afaik.
Atamic operations have some overhead.
The overhead of atomics is almost (if not entirely?) exclusively with regards to managing the caches in the CPU. Otherwise they're just normal bytes. Your CPU already has to do some cache management with regular bytes, so an atomic is only worse if there's contention (because that forces a flush).

The worst case for an atomic write is two additional cache line flushes, iirc.