Hacker News new | ask | show | jobs
by jeroenhd 1203 days ago
The semantics are actually operating system and even compiler flag dependent. On macOS you can choose the size of your zero page during build. The numbers I've listed are just the defaults.

Zig UB is not C UB. There is an entire language built on top of it. Just because something behaves a certain way in C, doesn't mean the same thing is true in Zig. Zig is no longer a code generator for C, it has switched to a self hosted compiler a while back. In fact, the language is rapidly progressing to the point where LLVM is a mere optional dependency.

I don't know the semantics around LLVM pointers. I don't see why 0x2 would be invalid, there are plenty of platforms programmed in C(++) that have a flat memory model. It would be quite painful to have a microcontroller where you can't send data to the output pin because LLVM decided that 2 is invalid (but 0 isn't). I've never seen LLVM complain about invalid dereferencing, though, it always ends up doing what the compiler tells it to do as far as I can tell.

Zig pointers will definitely cause UB but most Zig code shouldn't need them. Slices are actually bound checked and should probably be preferred in most cases of pointer arithmetic. Simple pointers can't be increased or decremented so you need to manually go through @intToPtr if you want to do real pointer arithmetic, which is quite unusable.

I haven't used Zig much so I don't know how many Zig semantics are copies of C semantics and how many are translated by the Zig frontend. However, "this is a bad/undefined thing in C so it must be a bad/undefined thing in Zig" is simply not true.

1 comments

I know Zig is not C, that's why I specifically mentioned LLVM. It's fine if Zig has different opinions about UB than LLVM does, but in that case ReleaseSafe builds should not use LLVM, not even optionally. If Zig says some operation is defined, but LLVM says it's undefined, well, LLVM is the one optimizing code so it's LLVM's invariants that matter. Right now it looks like Zig is playing fast and loose with correctness, shoving everything through LLVM but not respecting LLVM's invariants. And hey, if something is observed to segfault under some conditions today on the current version of LLVM, we'll just say segfaults are guaranteed. It's disappointing to see.
A lot of people have the same misunderstanding as you.

LLVM has rules about what is legal and what is not legal. If you follow the rules, you get well-defined behavior. It's the same thing in C. You could compile a safe language to C, and as long as you follow the rules of avoiding UB in C, everything is groovy.

Likewise, this is how Zig and other languages such as Rust use LLVM. They play by the rules, and get rewarded by well-defined behavior.

Is not one of the LLVM rules, pointers must be valid and have a valid provenance in order to be dereferenced? If 0x2 ends up in a pointer that is dereferenced (or 0x0 in a nonnull pointer), has that rule not been broken? And if the rule is broken, does that not trigger undefined behavior?
I invite you to share a snippet from the LLVM language reference[1] that backs up your interpretation.

I will return the courtesy, with regards to my interpretation:

> An integer constant other than zero or a pointer value returned from a function not defined within LLVM may be associated with address ranges allocated through mechanisms other than those provided by LLVM. Such ranges shall not overlap with any ranges of addresses allocated by mechanisms provided by LLVM. [2]

[1]: https://llvm.org/docs/LangRef.html

[2]: https://llvm.org/docs/LangRef.html#pointer-aliasing-rules

From the same section,

- Any memory access must be done through a pointer value associated with an address range of the memory access, otherwise the behavior is undefined.

- A null pointer in the default address-space is associated with no address.

A null pointer (0x0) is associated with no address, therefore it has no address range. So if you do attempt a memory access (dereference), the behavior is undefined. QED. A naive translation to assembly would indeed segfault on a modern OS, but LLVM's optimizations are free to assume that code path is unreachable and do anything else.

Once the program is in this state, a bug of some kind is unavoidable. I don't take issue with that - what I take issue with is your claim that this behavior is well-defined, because it definitely is not. It would be equally valid for a null dereference to corrupt your program state or wipe your hard disk.

You have already admitted that 0x1, 0x2, etc. are fine. Your remaining argument rests entirely on the incorrect premise that Zig's only option is to lower to LLVM IR using the default address space.