Hacker News new | ask | show | jobs
by kevingadd 887 days ago
A fun additional twist to this is that dereferencing nullptr is valid in WebAssembly, and actual data can in fact end up there, though ideally it never will.

If you ensure that the 'zero page' (so to speak) is empty you can also exploit this property for optimizations, and in some cases the emscripten toolchain will do so.

i.e. if you have

  struct MyArray<T> {
    uint length;
    T items[0];
  }
you can elide null pointer checks and just do a single direct bounds check before dereferencing an element, because for a nullptr, (&ptr->length) == nullptr, and if you reserve the zero page and keep it empty, (nullptr)->length == 0.

this complicates the idea of 'passing nothing' because now it is realistically possible for your code to get passed nullptr on purpose and it might be expected to behave correctly when that happens, instead of asserting or panicking like it would on other (sensible) targets

4 comments

Because WASM is not C and there is no "nullptr" in WASM. In WASM, zero is just an address, as valid as any other. And C actually doesn't require the null pointer value to have bit pattern "all zeros", precisely to allow for architectures where treating zero address as invalid would be way too cumbersome. And some implementations actually took that option.
I wasn't aware the spec allowed for nullptr to not be 0, that's fascinating! In that case you could probably use 0xFFFFFFFF as long as you limit the size of the WASM heap to below 4GB, then. You'd risk having addresses wrap-around though.
Nothing stops you from having your null pointer in the middle of the address space. Some C compiler for DOS or early Windows did that IIRC (it was 0xB800 or something?.. so that it wouldn't accidentally corrupt the interrupt table). Also, C explicitly prohibits address wrap-around problems for pointers:

    Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
But this is fine, since pointer comparisons (as in, less/greater comparisons) are actually both pretty restricted and required to have reasonable semantics when comparing pointers that point into the same object/array:

    When two pointers are compared, the result depends on the relative locations in the address space of the objects pointed to. If two pointers to object types both point to the same object, or both point one past the last element of the same array object, they compare equal. If the objects pointed to are members of the same aggregate object, pointers to structure members declared later compare greater than pointers to members declared earlier in the structure, and pointers to array elements with larger subscript values compare greater than pointers to elements of the same array with lower subscript values. All pointers to members of the same union object compare equal. If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P. In all other cases, the behavior is undefined.
By the way, this means that, among other things, if you use number N to represent a null pointer then number N-1 can not ever be a valid pointer to anything: adding 1 to a valid pointer is always allowed, and this addition should produce a non-null pointer — because the resulting pointer is required to be well-behaved in comparisons, and comparisons with null pointer are UB.
I'm not sure... wasm is an assembly, not a C implementation. It can define what happens if you load from 0 but it doesn't get to define if the C code `*nullptr` actually loads from 0. Whether or not it does is defined by your compiler, which is probably the clang frontend if you're on emscripten. But then again I think there's a clang flag to disable optimizing away reads/writes to nullptr.
HPUX must have had something similar, as when AOL backend code was ported to Solaris, which does segv on null dereference, we found all kinds of places where code that had been running without notable incident on HPUX started dropping core.
I’m kind of surprised it’s not defined that the first page must be 0-mapped read only… this sounds like a security vulnerability because it’s not like any other machine code would be written against and thus violate all sorts of safety assumptions.
Do you mean that as written? I'd find that extremely surprising, and would in my mind, violate all sorts of safety assumptions, primarily that deref'ing NULL traps¹.

E.g., I am pretty sure Go relies on some of the behavior described here: that the 0 page is unmapped, and that accesses will trap. This is why Go code will sometimes SIGSEGV despite being an almost memory-safe language: Go is explicitly depending on that trap (and it permits Go, in those cases, to elide a validity check). (Vs. some memory accesses will incur a bounds check & panic, if Go cannot determine that they will definitely land in the first page; Go there must emit the validity check, and failing it is a panic, instead of a SIGSEGV.)

IIRC, Linux doesn't permit at least unprivileged processes to map address 0, I believe. (Although I can't find a source right now for that.)

¹Yes, in most languages this is UB … but what I'm saying is that having it trap makes errors — usually security errors — obvious & fail, instead of really letting the UB just do whatever and really going off into "it's really undefined now" territory.

Ideally it would be an unmapped trap considering it’s literally how every other runtime works. The next best option is to make it read only. The dumbest option is to make it read/write as that’s going to be a vector for security vulnerabilities.
Security researchers are crafty. I wouldn't give them a read-only page, either. They'll find a way to turn a null-deref with that into an exploit.

"And then we just look for the UID under this NULL pointer — and hey, that's a read-only page of zeros! We're now root." Or something.