Hacker News new | ask | show | jobs
by ohazi 1469 days ago
A dig against Rust I sometimes hear is "Oh, data race freedom isn't such a big deal, if you really need it, a garbage collected language like Java will give you that guarantee."

So now I'm hearing that Go, a garbage collected language, doesn't guarantee data race freedom? I guess it's garbage collected but not "managed" by a runtime or something?

Why go to all that effort to get off of C++ just to stop 30% short? These are C-like concurrency bugs, and you still have to use C-like "if not nil" error handling.

Why do people keep adopting this language? Where's the appeal?

6 comments

A data race and garbage collection are unrelated. A data race occurs when:

> two or more threads in a single process access the same memory location concurrently, and at least one of the accesses is for writing.

Rust provides compile time protection against data races with the borrow checker. Go provides good but imperfect runtime detection of data races with the race detector. Like most things in engineering, either approach requires a trade off involving language complexity, safety, compile time speed, runtime speed, and tooling.

They're unrelated in theory, but in practice a lot of garbage collected languages do try to turn data races into defined behavior. Java requires the JVM to implement some defined semantics for data races, though I think they're still considered terribly confusing in practice. Python prevents data races with the GIL, and JS prevents them by either not having threads at all or not letting them share memory. I think Go is actually somewhat unique among modern, GC'd languages in that data races in Go are true UB (albeit with lots of best-effort checks).
Java promises that any variables touched by a data race are still valid, and your program still runs but it offers no guarantees about what value those variables have, so the signed integer you're using to count stuff up from zero might be -16 now, which is astonishing, but your program definitely won't suddenly branch into a re-format disk routine for no reason as it would be allowed to do in C or C++

Go has different rules depending on whether you race a primitive (like int) or some data structure, such as a slice, which has moving parts inside. If you race a data structure you're screwed immediately, this is always Undefined Behaviour. But if you race a primitive, Go says the primitive's representation is now nonsense, and so you're fine if you don't look at it. If you do look at it, and all possible representations are valid (e.g. int in Go is just some bits, all possible bit values are ints, whereas bool not so much) you're still fine but Go makes no promises about what the value is, otherwise that's Undefined Behaviour again.

I don't think Go is really unique here. Java put a lot of work in to deliver the guarantees it has, and since they turned out to be inadequate to reason about programs which don't exhibit Sequential Consistency that was work wasted. Most languages which don't have the data race problem simply don't have concurrency which is, well it's not cheating but it makes them irrelevant. C has "Sequential Consistency" under this constraint too.

> so the signed integer you're using to count stuff up from zero might be -16 now, which is astonishing

Actually, if it is an int, it is guaranteed to not be any number not explicitly set to (java has no-out-of-thin-air guarantees for 32-bit primitives). In practice on every modern implementation it is true of 64-bit primitives as well.

So the prototypical data race condition of incrementing a primitive counter from n threads can loose counts, but will never have any value outside the 0..TRUE_COUNT range.

Ooh, I did not know this. Do you happen to know where the "no-out-of-thin-air" guarantee is for the 32-bit primitives? Presumably in the Memory model docs somewhere?
I quickly glanced at the spec, but didn’t find it. But I didn’t make it up though, I remember reading it on the jvm mailing list and I found it here described by Brian Goetz himself: https://openjdk.org/projects/valhalla/design-notes/state-of-...
Java code actually consciously tolerates data races for performance reasons, the prototypical example being the implementation of String#hashCode() (racy single-check idiom).
Memory safe, excellent tooling, excellent base std library, no manual memory management, static binaries, trivial cross compilation, trivial concurrent programming etc. These advantages are still not inconsiderable over C, although not C++. Yes, however, it is not as "safe" as Rust. But downside is C++ & Rust are harder to design for upfront.
> Memory safe [...] trivial concurrent programming

As this post demonstrates, data races are trivial and common in Go.

Because many of the core types (interface, slice, map) are non-thread-safe multiword structures, they also break memory safety: https://blog.stalkr.net/2015/04/golang-data-races-to-break-m...

I’ve been programming in Go for a decade at this point, and I’m sure I’ve probably run into a data race before, but for the life of me, I can’t think of any specific instances. I’m not sure they’re as common as you think.

Far more often, I’ll run into race conditions in some service (multiple processes touching some network state concurrently), but this happens as often in Go as in Rust or any other language.

I suspect if you're not particularly looking for data races, you probably won't recognize their effects when these bugs occur. There is a very large set of C and C++ apps which don't run ASan or UBSan and have a long tail of bugs that are closed as "can't repro" or "probably fixed by x" that are actually the result of UB.
> trivial concurrent programming

If it is chock-full of race conditions, it is not trivial.

> "Oh, data race freedom isn't such a big deal, if you really need it, a garbage collected language like Java will give you that guarantee."

Java programs aren't guaranteed to be free of data race. Java spec guarantees that if that happens, there will be no undefined behavior (like in C++).

Hm, maybe I was misremembering. Managed languages like Java do give you memory safety, but I guess data race freedom isn't actually guaranteed.

Now that I think about it, this must be the case, right? You have to get `synchronized` right in Java or else you won't get what you expect.

There's a specified Java memory model (JMM) and in the case the programmer shooting themself in the foot with concurrency by eg buggy synchronization, it provides just enough guarantees to protect the integrity of the state of the runtime itself.

(I didn't find this integirty of runtime specified in the JMM spec, hopefully it's in the other specs).

In the JMM terminology, the "you're in the clear" term is "well-formed execution". If you break the rules, you're not in "well-formed execution" land any more, and things may fly out of your orifices, but a specific type of C/C++ style dragon won't maybe fly out of your nose.

So there's a weak kind of memory safety, your app data in may still be garbled, possibly in an attacker-controlled way, but the attacker probably won't get remote code execution.

There's a tricky distinction here. I'm pretty sure Java does provide data race freedom, in the specific sense that a data race is "Undefined Behavior caused by a write overlapping with another read or write". The Java standard says that the JVM isn't allowed to trigger this sort of undefined behavior. (Maybe some people say it's technically still a data race? I'm not sure of the right formal definition, but anyway the important thing is that in Java the UB doesn't happen here.) However, what happens when you do that can still be extremely tricky, and I think the Java compiler is still allowed to reorder reads and writes in ways that'll be extremely confusing if you have code that looks like a data race. It won't give an attacker arbitrary code execution, but it's very likely still a bug.
Data Race Freedom is, unsurprisingly, Freedom from Data Races. A Data Race is any time when there's concurrent modification of a memory value, on modern hardware with multiple simultaneous execution contexts those modifications could in some sense happen at the same moment.

[NB: Data Races are a subset of Race Conditions. Race Conditions are sometimes just a fact about the world and you need to write programs that cope with this, but they are not necessarily Data Races, if you copy all the files from folder A to folder B, and then delete folder A, somebody meanwhile adding a file to folder A which you then delete despite not having copied it would be a Race Condition, but it is not a Data Race. ]

The reason you want Data Race Freedom is that it's easy for a programming language to offer Sequential Consistency if you have Data Race Freedom, this guarantee is called SC/DRF.

Why do we want Sequential Consistency? Sequential Consistency is when programs behave as if stuff happened in some sequence. The disk reader gets a block from disk and then the encryptor applies AES/GCM to the block and then the network writer sends the encrypted block to the client. It turns out humans value this very much when trying to reason about any non-trivial program. Get rid of Sequential Consistency and the programmers are just confused and can't solve bugs.

So, we want SC/DRF and in most languages you get that by being very careful to obey the rules to avoid Data Races. If you screw up, you don't have Sequential Consistency. In most languages you lose more than that (in C or C++ you immediately have Undefined Behaviour, game over, all bets are off), but even just losing Sequential Consistency is very bad news.

Safe Rust promises DRF and thus SC. So instead of being very careful you can just write safe Rust.

AFAIK, you still have to be very careful since data races based on data dependencies can never be excluded in general, that is theoretically not possible. What you get is a guarantee that your program is not in an undefined state. There are still plenty of ways to shoot yourself in the foot with incorrect synchronization.
> AFAIK, you still have to be very careful since data races based on data dependencies can never be excluded in general

Hmm. Maybe I don't understand what you're getting at here. It seems like you're suggesting something like a[b] = x could race in safe Rust because we don't know b in advance and maybe it ends up being the same in two threads ?

But Rust's borrow checker won't allow both threads to have the same mutable array a so this is ruled out. You're going to have to either give them immutable references to a, which then can't be modified and so there's no data race, or else they need different arrays.

This is boringly easy to get right in theory, Rust just has to do a lot of work to make it usable while still delivering excellent runtime performance.

> AFAIK, you still have to be very careful since data races based on data dependencies can never be excluded in general, that is theoretically not possible.

Well, you could always require the programmer to supply a proof that the program is gonna be fine, before you compile anything.

(That means your programming language won't be Turing complete, but you can still code up anything you want in practice. Including Turing machines.)

The likes of Agda and Coq work in this way.

> A Data Race is any time when there's concurrent modification of a memory value

I do want to nail down the terminology, so help me with this scenario: Two simultaneous relaxed atomic writes to the same variable from different threads. To my understanding, this is not a data race (since this is allowed, while data races are never allowed), but it is concurrent. Do I have that right?

Well spotted. This is arguably a hole, albeit a deliberate one. In practice the main reason people do this is collecting some sort of metric, whose exact value is unimportant and which anyway isn't contemplated by the machine.

If your program tries to actually act on this data then yeah, you have successfully made your own life unnecessarily exciting and debugging your program may be difficult. I think it's fair to say you've only yourself to blame though since you had to explicitly choose this.

> there will be no undefined behavior

and no out-of-thin-air values.

> Why do people keep adopting this language? Where's the appeal?

There're many aspects to consider when evaluating a tool. To me, Go has one of the best overall packages:

- std lib - tooling - performance - concurrency - relatively easy to get devs - reliable - mature

Also, Go has no substantial drawback. I personally consider an external runtime a drawback, for example.

I also use Rust personally. This discussion shows the value of Rust in terms of correctness. But for my professional projects Rust lacks the ecosystem guarantees that Go has with its great and useful standard lib. Looking at the Cargo dependencies of a mid size Rust web service is scary and reminds me of NPM. A large fraction of essential libs are maintained by a single person. Or unfortunately unmaintained. Rust with Go's std lib would be truly great.

If someone is claiming the garbage collection means freedom from data races, they are unambiguously wrong.

Garbage collectors solve double-free bugs and usually memory leaks due to cyclic references.

Yes. Though garbage collectors can have an indirect influence on the design of the language, that makes it easier to handle data races.

(As an example, image how much simpler Rust would be, if they went with garbage collection. Or how much more machinery Haskell would need, if they went with Rust's memory management strategies.)

I'm not sure what Rust would gain from a garbage collector - it'd still need all of lifetimes for instance, because ownership is the necessary piece for preventing races.
Go has some appeal in general. It's super easy to stand up webby glue appy things in go, and it has a solid cloud ecosystem, maybe the best cloud ecosystem.

That said, people ragging on rust pushing that trope are basically just making stuff up to hate on it. Anyone who looks into the language and views programming languages as tools and understands these issues gets why someone might use rust.

But yea, it's ironic... Especially seeing how many times I've seen smart colleagues get go concurrency wrong.

I’ve been dabbling with Rust on and off since about 2014, I was actually surprised at how difficult multithreading still is in the language. It’s neat that it stops you from doing things that could be incorrect, but I couldn’t get anything to compile in the first place, even with mutexes guarding the shared memory (it’s been ~6+ months since I last tried and I’ve forgotten the details except that I ended up reverting to a single-threaded implementation).