Hacker News new | ask | show | jobs
by pjmlp 1636 days ago
Data races across threads, Rust type system does nothing to prevent data races across processes.
3 comments

That's not entirely correct, or at least misleading. Rust will provide the same guarantees for variables in memory shared between processes as for variables in memory shared between threads. But you have to make sure that any locking datastructures you're using (like mutexes or read-write-locks) are able to work across processes.

(There are limits, though: if you map the same physical addresses to different virtual addresses, Rust can't help you. However, that is independent of threads/processes, because you can also do that in single-threaded programs.)

Which is a different story that just asserting fearless concurrency no matter what, also misleading.

Hence why I try to make a point that comes with a footnote.

Rust is after all supposed to target all kinds of system programming scenarios.

> Which is a different story that just asserting fearless concurrency no matter what, also misleading.

Frankly, you're being a bit disingenious. Nobody claimed that Rust can or will solve all conceivable concurrency problems. "Fearless concurrency" is generally understood to mean "...within a single program", not "...across different processes/machines/networks". By the time you understand what interprocess shared memory is, you're well able to correctly interpret Rust's "fearless concurrency" slogan.

Understood by most on the Rust community, not by others.

Most outside of the community aren't aware that nomicon points out exactly this.

By the way, there are also ways to cause havoc within a single program, example using a file as backing store being accessed by multiple threads concurrently, or accessing database data without transactions.

My goal is not to bash Rust, rather to trigger discussions around these kind of problems.

Both those situations are race conditions, but neither are data races. Rust only prevents data races, which is a specific kind of race condition, but it does not prevent race conditions in general.
It is the "in general" I care about and think it gets too little discussion on the community, because just like some RIR threads, the details get lost in the discussion.
When can data races across processes happen?

Are you talking about databases, services or IO and such?

I guess the simplest example is shared memory between processes.

Even Python has it: https://docs.python.org/3/library/multiprocessing.shared_mem...

Access to raw memory is locked behind the unsafe keyword though. Rust officially already does not guarantee any safety in that scenario even within 1 process.
There is always unsafe at some level on the standard library.

The point is that it doesn't protect the user of a crate that only exposes a fully safe API, unless they do digging to validate overall architecture safety.

>Even Python

It's not "even". Python specifically has it because it has no real threading.

Python does have real threading. The `threading` module provides os-level threads and synchronization primitives. The only difference between this and multithreading in C or Java is that CPython's GIL prevents more than one thread executing bytecode at a time. This prevents parallelism, but not concurrency.

Note this does not mean that python code is thread-safe by default. At most, you can theoretically rely on bytecode operations to be atomic, which means you'll need to synchronize multi-threaded code with mutexes, semaphores and higher-level synchronization constructs.

Python has cooperative threading. It's the same threading model used in the Erlang VM, Julia and many other dynamically typed languages. But preemptive threading vs. cooperative threading is orthogonal to whether data races can happen. Java threads are preemptive but data races can still happen.
The Erlang VM does preemptive scheduling.
No it doesn't.
SharedMemory is a new thing in Python. Not even supported by all 3.x versions.
But this is specifically about Rust.

What data races between processes, other than Disk/IO, databases, or external services, can a Rust program have?

I explicitely exclude the whole category of external services, since that is "by design" really. And the whole reason for ACID, global mutexes, transactions and CRDTs.

That and kind of data structure that can be shared via IPC mechanisms, some of them even transparent for the processes.
Environment variables.

Locales.

Quite a few other POSIX bits, really.

It is not possible to have a data race with environment variables across multiple processes. Every process has its own copy of environment variables (in fact they have their own copy of the entire environment).

I'm not sure what data race is possible across processes with locales, that's too vague of a claim to make.

One type of locales I know are the LC_ env vars. So there the "ENV is a copy" applies too.

Another would be to read and write into locale files, such as JSON. But then the ame applies as with any database or IO: this is inherently race-condition-prone and that is by design.

Maybe grandparent is thinking about locales in many web frameworks, that is some global var which should not be shared across users. So that if you set `Locale.current = "EN_GB"` that applies for any (email)notifications, errors, files, responses or such, being sent out during that request/response and during any jobs that request/response may spawn. In e.g. Rails this "somewhat global var" is a Frankenstein, but works suprisingly stable, actually.

It's interesting to see if one can come with a solution based on custom `Send`/`Sync`-like traits.

Of couse it will require nightly since auto traits are not stable.