Hacker News new | ask | show | jobs
by Twey 89 days ago
But there are data races.

Mem as I set it out above is an example of multi-threaded, mutable data. Another process can't alter the data you've received, indeed, just as in a shared-memory system another thread can't alter the data after you copy it out of shared memory into your registers or an unshared memory space. But if you're storing the data in shared memory (a.k.a. requesting it from Mem every time) then you are racing with other actors who have access to Mem.

1 comments

A data race implies getting results that are not allowed by the single-threaded semantics of the API - corruption, basically. It sounds like that's not possible because the API is strictly single threaded, and any data you receive is copied (I think?) so it can't be mutated afterward. Which also means it's not shared memory - you are not sharing ownership of it, you're exclusively using copies.

You're describing a logical race, something unintended but allowed due to interleaving behavior, not a data one. Logical races are a heck of a lot harder to prevent without going fully proof-oriented.

The shared memory in the Mem setup is not the local copies of the values after reading from Mem (just as the registers or private memory ranges of a thread are not shared) but (the internal state of) the Mem actor itself.

To phrase my point here a bit differently, the distinction between a data race and a logical race is purely the scope of the expected consistency. In the standard multi-threaded case as well, if you read a single word from the memory it will definitely be consistent — you'll get the result of either one write or another, never half of the word from one write and half from the other. But values in real programs are usually bigger than a single word, and once you read two words from the memory it becomes possible (ignoring for the sake of argument the facilities modern memory provides for reading larger memory ranges atomically; feel free to replace ‘word’ with your atomic unit of choice) to get one word from one write and one word from another. As words these are consistent — they're both valid words and they both come from an intentional write — but put together they may not form a coherent value of the higher-level type you meant to read. This is how you get ‘corrupted’ data: the notion of corruption comes not from the memory semantics but from the semantics of the type that the memory is being used to represent. This applies exactly the same to Mem: each word is, as you say, consistent, but if you try to read a value by doing two reads, the resulting value may be ‘corrupt’ (formed by the interleaving of two writes) depending on what other threads are doing with Mem at the time.

That would be this:

>you can emulate a Turing-complete computer in a Turing-complete language

If you've created semantics that allow for incomplete reads/writes, you allow incomplete reads/writes.

The semantics don't seem to exist natively though - if you write a large object in one message and read it in another, is it possible to read part of your object, and part of another that some other message wrote, meaning you get data that no message ever wrote? Or is it always some "real" value?

No, a message is always consistent, both in the actor model and in hardware memory. My point is just that if you restrict your messages to those supported by hardware memory, you get exactly the same semantics when sharing the actor as you do for hardware shared memory. However, the actor model _also_ allows you to define higher-level messages that are consistent in a way that is defined by the application semantics.

>> you can emulate a Turing-complete computer in a Turing-complete language

> If you've created semantics that allow for incomplete reads/writes, you allow incomplete reads/writes.

That's true, but isn't entirely relevant to the discussion for a few reasons. First, Turing-completeness does not require the existence of race conditions; it is perfectly possible to have a Turing-complete language (for example, standard Turing machines!) that doesn't have the ability to define data races. Second, my entire point here is that you don't have to ‘create semantics that allow for incomplete reads/writes’. That semantics is a subset of the standard actor semantics. In fact, I am claiming more broadly that all reads/writes are either complete or incomplete (to borrow your terminology) depending on your semantic perspective. The reads/writes of standard hardware shared memory are also ‘complete’ from the perspective of reading/writing individual words; the problem is just that almost all applications need a notion of ‘completeness’ that is higher-level than that, which gives rise to the (relativistic!) notion of ‘data race’.