Hacker News new | ask | show | jobs
by pron 32 days ago
> In C, we can have a data race on a single thread and without any writes!

You need to distinguish between a UB and a race, and I think that's something that discussions of UB miss. Take any C program and compile it. Then disassemble it. You end up with an Assembly program that doesn't have any UB, because Assembly doesn't have UB.

UB is a property of a source program, not the executable. It means that the spec for the language in which the source is written doesn't assign it any meaning. But the executable that's the result of compiling the program does have a meaning assigned to it by the machine's spec, as machine code doesn't have UB.

A race is a property of the behaviour of a program. So it's true to say that your C program has UB, but the executable won't actually have a race. Of course, a C compiler can compile a program with UB in any way it likes so it's possible it will introduce a race, but if it chooses to compile the program in a way that doesn't introduces another thread, then there won't be a race.

4 comments

> because Assembly doesn't have UB

To be pedantic, old hardware like 6502 family chips (Commodore 64, Apple II, etc) had illegal instructions which were often used by programmers, but it was completely up to the chip to do whatever it wanted with those like with UB.

> illegal instructions... were often used by programmers

Intentionally, with an expected effect? I'd need a citation for that.

Yes, many of those are perfectly stable. For example, the 6502 has an undocumented instruction commonly known as "LAX" which loads both the A and X registers at the same time in a predictable manner in most addressing modes, in the same time and space it would otherwise take to load either of those registers on their own.

The benefits of being able to do stuff like this when you need to conserve resources are obvious, and common idioms have formed around their use. Check out https://csdb.dk/release/?id=198357

Some desultory googling turned up:

* https://www.nesdev.org/wiki/CPU_unofficial_opcodes#Games_usi...

* https://hitmen.c02.at/files/docs/c64/NoMoreSecrets-NMOS6510U... (doesn't name any software, but some copy protection schemes were already known to use them)

Some instructions were very useful and they were simply discovered by programmers who tried out what each instruction did. People did not necessarily have access to documentation those days!

So any instruction or hardware feature would get used, whether it's "officially" documented or not.

> You end up with an Assembly program that doesn't have any UB, because Assembly doesn't have UB.

I guess that's true if you think of assembly as a more readable form of machine code, but from a practical sense I'd argue that assembly inherits the undefined behaviors of the architecture it represents and the implementations of that architecture it actually builds for.

IIRC the OG Xbox security was broken partially as a result of undefined behaviors in x86 where the AMD CPUs that were used in early development would crash or throw an error or something when execution reached the end of the memory space but the Intel CPU they switched to instead just rolled over and kept executing from 0.

I specifically said data race, which is a known term of art and a type of language-level UB. It is separate from the races you're thinking about. Just like signed integer overflow or use-after-free, the compiler is allowed to assume data races never happen.
The problem is that in the quest to win benchmark games, compilers started to take advantage of UB for all kinds of possible optimizations, which is almost as deterministic as LLM generated code, across compiler version updates.
Soooo… Pay attention to updates changelog?
This isn't an answer. UB is not only code dependent, but in many cases value-dependent as well. Changing anything about a program has the potential to cause UB anywhere in the code graph affected. So even the smallest possible change requires you to fully understand that entire graph, as well as the entire compiler history and how it interacts with your program. Remember, UB isn't diagnostic and runtime sanitizers don't catch everything, nor does exhaustive testing and static analysis.
If only those changes were all listed there...