Hacker News new | ask | show | jobs
by Manishearth 3289 days ago
There are three ways of sharing data across threads.

One is by sharing the data with the thread when it is spawned via a closure. Spawning will fence. No problem there.

The second is to use a good 'ol Sender/Receiver channel pair. These are effectively a shared ring buffer that you can push to and pop from. They also have a fence somewhere.

Finally, you can stick your data into a mutex shared between threads (and let the other thread wait and read it). This will IIRC fence, or do something equivalent.

You can of course build your own ways to do this, but they will need unsafe code to be built (the three APIs above are also built with unsafe code). It is up to you to ensure you handle the fences right when doing this.

The responsibility here is on the publishing mechanism. Most folks use one of the three ways above using primitives from the stdlib depending on the use case.

1 comments

Yeah, I understand and what I expected to be the answer. My point is that when people talk about Sync not allowing data races, there's the asterisk attached to that statement. That footnote is that publishing code, which is completely separate from the type itself, needs to uphold its responsibility. Unsafe code is usually discussed in light of raw pointers and more generally raw memory ops, but I rarely see this aspect mentioned.
My point above was subtle but important: the asterisk you're mentioning here is not specific to Sync. This is true of all safety guarantees in Rust. unsafe code must uphold the invariants that safe code will rely on, otherwise it's buggy. For example, if the implementation of `Vec<T>` accidentally got its internal `length` out-of-sync with the data on the heap, then nothing bad in and of itself necessarily happens immediately. The bad thing only happens the next time you try to do something with the `Vec<T>`, which will be in safe code.

The safety of things in Rust is built on abstraction. If abstraction gets something wrong in its `unsafe` details, then there is a bug there. In other words, the asterisk you're mentioning is "You can trust that safe Rust is free of memory safety, unless there are bugs." I feel like that's discussed and acknowledged quite a bit.

> Unsafe code is usually discussed in light of raw pointers and more generally raw memory ops, but I rarely see this aspect mentioned.

I guess I don't see a difference. The safeness of Rust code depends on the correct use of `unsafe`, and this applies to everything, not just Sync.

This idea of `unsafe` being "trusted code" is one of the first things that the Rustonomicon covers: https://doc.rust-lang.org/nomicon/safe-unsafe-meaning.html

> I guess I don't see the difference

At a high and general level, yeah, it's all "unsafe". But, most conversations about unsafe don't talk about this aspect. So while what you say is true, I'm merely pointing out that this concurrency aspect doesn't seem to be mentioned much. And while it's implied at a high level, I think it's worth mentioning.

Basically, there's no issue - I just think this should be called out more when concurrency is discussed.

You need raw pointer ops (or, well, dealing with UnsafeCell, which does raw pointer stuff), or syscalls for these too.

Concurrency is not special here. There are all kinds of invariants unsafe code might be required to uphold. So yeah, we could mention concurrency, but then we could also mention UTF8, noalias, initialization, the vector length invariant, the HashMap robin hood invariant, various BTreeMap invariants, etc etc. "Make sure you have fences" is just another semi-specific invariant.

I disagree that "most conversations about unsafe don't talk about this aspect", compartmentalizing unsafe invariants is a major part of these discussions (it's like the first chapter of the nomicon, even)

> Concurrency is not special here I beg to differ. Concurrency comes with its own bag of hazards, as I mentioned in my reply to burntsushi. Comparing its invariants with Vec's length invariant/HashMap's RH invariant, and any other single threaded/internal invariants misses the point.

Unsafe discussions that I've seen rarely talk about fences - they tend to focus on raw pointer ops, ffi, transmutes, unbound lifetimes, and in general, are single thread focused.

Right, but I can make the same point about other invariants. Each comes with its own bag of hazards. You can write pages about the robin hood invariants.

Concurrency is particularly complex, perhaps. I think one of the reasons you don't see that much discussion of this is that in general folks in Rust don't write that many internally-unsafe concurrent abstractions. There are a bunch of great safe building blocks out there (stdlib ones, rayon, crossbeam) which folks use for concurrency; it's very rare to build your own. So that might be it.

At least with the stuff I work on like 50% of the unsafe Rust discussions have been around thread safety and ordering and fences, but we're in that relatively rare situation where we need to build those abstractions, so perhaps it's just me who sees these discussions happening.

-----

It's also probably just that discussions introducing unsafe will deal with problems people are used to -- and memory safety is a far more "normal" problem than thread safety.

Could you say more about why you see concurrency as a special thing to call out? I'm genuinely curious so that I can adapt how I explain this stuff in the future. It would help to understand the distinction you are seeing that I am not. :-)
Only because I've had several people ask me how Rust handles memory ordering after they've learned of Sync. Sync documentation focuses very narrowly on there being no data races against the type that implements it. There's also obviously a difference between a struct consisting of a single AtomicUInt (for example) and a struct consisting of a plain uint. The former can be published unsafely because it internally already provides the appropriate fences by virtue of the atomic. In the latter, it requires the publishing to provide the necessary fences. Again, I don't mean to say that doesn't make sense or is unexpected (after one thinks about it), but I'd urge a bit more focus on that part as well.

Given there's no official memory model, similar to say Java Memory Model, there's not much to go by (correct me if I'm wrong). The JMM, for instance, spells out what needs to happen for happens-before edges to be created. It also talks about racy publication of values. Granted there's no close analog to Rust's unsafe in Java. But saying "T: Sync means it's free of data races in safe code", while correct, is a slightly vacuous statement since that T interacts with other components. And yes, those components will likely involve unsafe code, and unsafe code has its own caveats, but still, I don't think it hurts to make this more pronounced. Concurrency is a special beast, rife with its own hard-to-debug hazards. Being a bit more verbose and possibly repetitive about the hazards won't hurt :).

I don't think it's vacuous. It's very important for compartmentalizing unsafe. It interacts with other components, and that statement tells you the responsibilities of both T implementing Sync, and abstractions accepting Sync types.

It is vacuous at a big picture level where you're trying to understand the complete thread safety story, but that was never what that statement was trying to convey.

----

I think documentation here can be improved, though. When I get more time one of the things I want to do is do a major revamp of the concurrency docs, including paragraphs on how the memory ordering stuff works. I'll include filling out the Sync docs in this, thanks for the feedback!

The atomic type doesn't provide the necessary fences automatically. Operations on it have various sorts of fencing, but these do not necessarily connect in the way you think they do. The partial initialisation problem of creating a struct and publishing a pointer happens for the atomic types too, as they don't (and should not!) do everything with sequential consistency.