Hacker News new | ask | show | jobs
by anderskaseorg 696 days ago
Note that the “unsafe read beyond of death” trick is considered undefined behavior in the Rust and LLVM memory model, even if it’s allowed by the underlying hardware. Like any undefined behavior, compilers are allowed to assume it doesn’t happen for the purpose of optimization, leading to results you don’t expect. The only way around this is to use inline assembly.

https://github.com/ogxd/gxhash/issues/82

3 comments

It would be neat to have non-assembly options for things like this. A "load with unspecified elements for any values past the end of the allocation, UB only if the hardware doesn't like it" thing shouldn't be hard to support, even if just as an alias for the respective assembly invocations.

Additional neatness would be being able to request a guarantee that all allocations - malloc, stack, constants - have at least, say, 64 bytes of non-faulting addresses after them, though that is significantly more complex, requiring cooperation between a bunch of parts.

Annoying thing is that this is trivial with a custom allocator (as long as the compiler isn't told to consider the custom sub-allocations as separate), but then you're stuck not being able to use your SIMD stuff on anything outside your custom heap due to the very tiny chance of segfaulting.

Sanitizers/valgrind don't necessarily become pointless with this even - the past-the-end values are still undefined, can be tracked as such, and error on use.

The sanctioned way to this would be masked aligned load intrinsics, alignment avoids page faults, masking avoids reading undef bits, being an intrinsic conveys the intent to the compiler so it'll know that this is not an OOB read.

The other option that I've seen discussed is adding a freezing load to LLVM that turns the undef bits into some unspecified but valid bit patterns.

> A "load with unspecified elements for any values past the end of the allocation, UB only if the hardware doesn't like it" thing shouldn't be hard to support

Not an expert, but to me this sounds like you want an alternative where behaviour for a read beyond the end of an allocation is merely implementation-defined, not undefined. That means the implementation (e.g. LLVM) has to document what they do — which may be platform-dependent — and the choice of whether it becomes undefined is up to the implementation.

The natural thing to do here for the implementation is of course to say "I'm just going to emit the load instruction, it may crash your program, better be prepared".

Here it'd be perfectly fine to define it as "completely arbitrary bits past the end, potentially even differing between back-to-back calls of loading the same memory"; specific backends will end up refining that of course. In LLVM those bytes would behave as freeze(poison).
Not every platform in existence will return data when asked to access stuff out of bounds, even when sufficiently aligned. So you wouldn't want to bake into the standard that valid bits must be returned; you'd want to allow crashing, in the standard. An implementation might then define that for suitably aligned addresses, data will be returned (just not necessarily sensible data).
It should still be with "UB only if the hardware doesn't like it", of course. If weird funky hardware not following usual memory paging is of worry, providing a "memory_protection_granularity" constant is trivial, to be used instead of the page size for the check (and said funky hardware could set it to 1, thus always failing).

Alternatively, a different API would be returning an optional of the loaded data, having the stdlib/language/backend convert that to the appropriate boundary check (or always returning a None if impossible).

Ideally there'd be languages that can be at least configured into providing more "unsafe" useful things, even if at the expense of not having the code be compilable targeting funky hardware that noone would run the software in question on anyway.

What about tools like ASAN? I want it to be able to tell me if I read a single character out of bounds. Tools like ASAN can't do this if the language gets rid of undefined behavior. The reason why undefined behavior is undefined is because it's such a degenerate state for a program to exist in that any attempt by a language to imbue it with a particular blessed meaning is, to put it politely, crazy; like trying to prove a theorem that's allowed to have some contradictions.
Simplest solution and the one I use is all SIMD related buffers use a custom allocator(actually everything uses it) and it always rounds the allocation size up to the SIMD width.

Masked loads kinda suck, they are a tiny bit slower and you now need a mask and you need to compute the mask..

This is what I do too (in my case I don't round up the allocation size and just let loads & stores potentially see the next object (doing tail stores via load+blend+store where needed; only works if multithreaded heap mutation isn't required though)).

The one case it can be annoying is passing pointers to constant data to custom-heap-assuming functions - e.g. to get a pointer to [n,n-1,n-2,...,2,1,0] for, say, any n≤64, make a global of [64,63,...,2,1,0] and offset its pointer; but you end up needing to add padding to the global, and this materializes as avoidable binary size increase as the "padding" could just be other constants from anywhere else. Copying the constant to the custom heap would be extra startup time and more memory usage (not sharable between processes).

"UB only if the hardware doesn't like it" sounds like you want to shift the complexity from the developers who know the problem domain best to the packagers.

As soon as the thing is packaged to run on an raspberry or something else that doesn't like it, it will start to generate CVEs and be a major pain.

This shouldn't ever be a security vulnerability, outside of perhaps denial of service from segfaults (though I'm pretty sure you'd find hardware with no page faults before finding one with pages less than 4KB; and of course, if you wanted to not be hard-coding 4KB, a compiler providing a "minimum page size" constant for the target architecture should be possible, and could return 1 on page-less hardware). But, yes, as with many optimizations, getting them wrong could end up badly.
For the case of specific vector extensions that imply specific cache line sizes, and loads that do not span multiple cache lines, I don't think you could run into issues.
Is that even true at a hardware level? What if you read into an unmapped page or into protected memory? (I haven't read the code, maybe it has alignment guarantees that avoid this?)
You make sure you don't do that.

A trick to avoid reading beyond the end of the buffer is to make sure the end of the buffer lies on the same page. Typically, the OS will allocate memory in pages of 4KB, thus we can make a function that checks whether it is okay to read beyond or if we should fallback to the copy version.

-- https://ogxd.github.io/articles/unsafe-read-beyond-of-death/

That's not a guarantee. On some systems memory protection can be sub-page (not sure about x86).

But it sounds like the masking feature mentioned in a sibling comment takes care of it anyway.

Masking is nice, but not available everywhere (i.e. intel is still making new generations of CPUs without AVX-512, and apple silicon doesn't have any masked loads/stores either).

It might not be the nicest thing to assume to be the case on all hardware, but it shouldn't be too unreasonable to put it under an "if (arch_has_a_minimum_page_size)". So many things already assume at least 4KB pages, Intel/AMD aren't gonna break like half the world. If anything, they'd want to make larger pages to make larger L1 caches more feasible.

There's a debate on how unsafe/unsound this technique actually is. https://github.com/ogxd/gxhash/issues/82

I definitely see the conundrum since the dangerous code is such a huge performance gain.

The code uses unaligned load and store instructions, so it should be possible to trigger memory access to unmapped addresses.
Isn't the point of the "masked load" instruction discussed in the article to avoid that? https://stackoverflow.com/a/54530225
Unfortunately, AMD's masked AVX2 instructions reserve the right to fault even for masked-off elements :(
> Like any undefined behavior, compilers are allowed to assume it doesn’t happen for the purpose of optimization, leading to results you don’t expect

No. First, undefined behavior is a term of art in the C standard, so the idea of generalizing it is nonsensical. Second, ANSI C explicitly does not allow this assumption, and ISO C—while more open ended—doesn't specifically justify this assumption. The entire "UB = assume it cannot happen" thing is grossly dishonest FUD.

Both (unsafe) Rust and LLVM have their own concepts of undefined behavior (https://doc.rust-lang.org/reference/behavior-considered-unde..., https://llvm.org/docs/LangRef.html#undefined-values), and while some of the details vary by language, compilers for all of these languages do in fact optimize based on the assumption that undefined behavior is not invoked in the abstract execution model. This is a real thing (https://blog.llvm.org/2011/05/what-every-c-programmer-should..., https://highassurance.rs/chp3/undef.html), and any debate about whether it’s “justified” is many decades late and entirely academic (but we don’t have any other methodology for building optimizers of the quality that programmers expect from compiled languages).
> any debate about whether it’s “justified” is many decades late

"It's always been this way so it's impossible to address." Forgive me if I'm not convinced.

From https://highassurance.rs/chp3/undef.html:

> In other words, should a developer inadvertently trigger UB, the program can do absolutely anything.

Well, no. It is "behavior . . . for which this International Standard imposes no requirements." There are restraints and constraints beyond the ISO standard.

realloc(p, 0) is now undefined in C23. However, every mainstream OS and compiler specifies the correct behavior for that environment. It is simply not. true. that the program can do anything. What is true is that the range of behavior is not restricted by the ISO standard.

At least 15 years ago, the team responsible for developing both our ASICs and core routing/switching code for the firmware on our networking devices worked under the consensus understanding that "Undefined Behavior" (in C, in their case) meant precisely that - could trigger a Nuclear Launch, blow up the planet, etc.... There was absolutely no behavior (within the confines of the limitations of C compiler(s) for the various hardware platforms we built on) that was not restricted.

A very significant amount of their effort, time, focus went into understanding precisely what was, and was not "Undefined Behavior" - the instant you did anything that was "undefined" anything that happened after that was fair game.

They also did zero dynamic memory allocation after starting an application. All memory was allocated on startup based on initial config settings.

My sense in watching that extraordinarily skilled team was that the logic and features they were building (on a very complex FHSS MAC) were secondary to convincing the compiler and hardware to do something that the specifications and definitions should happen. The great firmware developers were also pretty solid language lawyers.

I'm not saying developers who are careful about UB aren't doing the right thing. They are absolutely doing the right thing.

What I am saying is that a compiler that sees

    int8_t x;
    float x;
and does anything other than "terminating a translation or execution (with the issuance of a diagnostic measure)" is doing the wrong thing.

I am also saying that a compiler that offers -fwrapv and formats your hard drive on int x = INT_MAX; x++; rather than "behaving during translation or program execution in a documented manner characteristic of the environment" is pathological, violates the spirit of the ANSI and ISO standards, and violates the letter of the ANSI standard.

You may not like it, and it’s within your rights not to like it, but the reality is that compilers do treat UB this way, and it’s not “grossly dishonest FUD” to point out that this is the reality. Here’s a test case demonstrating that UB can actually format your hard drive: https://bugs.llvm.org/show_bug.cgi?id=49599.

Note that one of the differences between C and Rust is that integer overflow is not UB in Rust (it panics in debug mode and wraps in release mode: https://doc.rust-lang.org/book/ch03-02-data-types.html#integ...). But there are other sources of UB in unsafe Rust, such reads through a pointer not allowed by the memory model.