Hacker News new | ask | show | jobs
by anyfoo 1694 days ago
Heh, 10 years ago I gave a presentation about how easy folks used to x86 can trip up when dealing with ARM's weaker memory model. My demonstration then was with a naive implementation of Peterson's algorithm.[1]

I have a feeling that we will see a sharp rise of stories like this, now that ARM finds itself in more places which were previously mostly occupied by x86, and all the subtle race conditions that x86's memory model forgave actually start failing, in equally subtle ways.

[1] The conclusion for this particular audience was: Don't try to avoid synchronization primitives, or even invent your own. They were not system level nor high perf code programmers, so they had that luxury.

3 comments

But Peterson's algorithm requires explicit memory barriers even on x86, it doesn't seem the best example to show the difference.
Here are my slides from back then: https://reinference.net/mp-talk.pdf

You made me wonder, because I definitely remember using Peterson's Algorithm, so I went back to my slides and turns out: I first showed the problem with x86, then indeed added an MFENCE at the right place, and then showed how that was not enough for ARM. So the point back then was to show how weaker memory models can bite you with the example of x86, and then to show how it can still bite you on ARM with its even weaker model (ARMv7 at that time, and C11 atomics aren't mentioned yet either, but their old OS-specific support is).

Damn, my knowledge of ARM and x86 memory models was limited to "x86 provides a stronger model" but that was it. So much to learn, thanks for the slides.
Oh, right, yes, ARM additionally needs a release barrier on the unlock path.
> Don't try to avoid synchronization primitives, or even invent your own.

Makes me wonder if it's really a good idea in most cases to use, for example, the Rust parking_lot crate, which reimplements mutexes and RW locks. Besides the speed boost for uncontended RW locks, particularly on x64 Linux, what I really like about parking_lot is that a write lock can be downgraded to a read lock without letting go of the lock. But maybe I'd be better off sticking with the tried-and-true OS-provided lock implementations and finding another way to work around the lack of a downgrade option.

Unless you're the maintainer of the parking_lot crate, you're not "inventing your own". And since parking_lot is AFAIK the second most popular implementation of mutexes and RW locks in Rust (the most popular one being obviously the one in the Rust standard library, which wraps the OS-provided lock implementations), you can assume it's well tested.
Everything about which people tell you to “not invent your own” must be invented by someone.
I guess it's more about priorities, if you want to build a web app don't reinvent the underlying protocols. If you want to build an embedded system for monitoring room temperature don't invent your own locks.

But if you want to make a lock by all means make a lock, just don't go and reinvent the chip architecture...

You're taking a saying too literally.

People who take on task of writing such library, and develop it to the point it's a language-wide standard, usually know what they're doing (or learn on the job :)

Popularity of such library helps test it thoroughly on many platforms in various conditions, so there's a high chance the bugs will be spotted and fixed.

I mean, that's too literal of an interpretation.

It's more like "if you don't need to, don't invent your own [x]." People who like to invent [x] are usually smart enough to understand why that warning is there to begin with, and don't tend to argue with it.

It's not "don't invent it".

It's "be competent before you invent it, because it's hard". And if you aren't, then let someone who is do the inventing.

the best way to _get_ competent is to try

so yes - invent your own synchronization primitives. please.

just dont believe they are correct without being serious about trying to prove they are. and dont hold up your whole project for self-enrichment.

but try to layer as much in as you can.

developers these days are so productive, until they fall down and cant get up. and then they are completely useless.

Disagreed. There are some problems, and low level hardware is one of them, where you need understanding and not just experience.
Don't invent anything unless your job is to invent it. Of course you might not know what your job is until you are in the middle of it.
In that case you should never run any software. Go live as a hunter-gatherer in the wilderness.
I doubt that. The number of ARM processors is far greater in reality than in x86 if we clarify it by saying “in operation” rather than historically and these stories will become more common but certainly won't see a “sharp increase”.
This sort of bug only happens when running a multithreaded program (with shared memory) on a multicore processor.

You do need both for the problem to happen: Without shared memory, there’s nothing to exploit. And with a single core only, you get time-sliced multithreading, which orders all operations.

My point is, that combination was a lot rarer in ARM land before people started doing serious server or desktop computing with those chips.

Of course. Any such flaws in the Linux kernel or any library used by Android should have been found by now, for example. But the number of ARM processors running developer/server/desktop stacks has been tiny until recently. In my experience, quite a lot of Linux on desktop software fails to even build on non x86_64 machines.
Are you kidding? Arm computers are by far the most common over the past 10 years. Computers are everywhere and servers and home computers account for at most 10% of the market for cpus and microcontrollers.
The vast majority of those are not running multitheaded workloads written by complete randos.
The dominant Arm core in the world is a Cortex-M (or Cortex-R) which are single-core. They are 99% of the time on a die with far less <512K SRAM, and run an RToS or baremetal.

These outnumber x86+Cortex-A by probably a factor of 1,000.

There are almost certainly more multi-core ARM chips than x86 chips around, too, tho.
That's a good question. I guess we would need to compare all the arm multi-core in smartphones & some laptops, vs all the intel laptops/desktops/servers. Hmmm... tough one.
Virtually all modern (last five years, for the low end) smartphones have multicore chips. It's estimated that there are around _7 billion_ smartphones worldwide, with another 1.5bn sold every year. So I don't think it's a very tough one.