Hacker News new | ask | show | jobs
by dzaima 458 days ago
Rust doesn't help here; you necessarily must do all stores in potentially-mirrored memory as volatile (and possibly loads too), else you can have arbitrary spooky-action-at-a-distance issues, as, regardless of &[T] vs &mut [T] or whatever language-level aliasing features, if the compiler can see that two addresses are different (which they "definitely" are if the compiler, for one reason or another, knows that they're exactly 4096 bytes apart) it can arbitrarily reorder them, messing your ring buffer up. (and yes it can do so moving ops out of language-level lifetimes as the compiler knows that that should be safe)

vmcircbuf just exposes the mutable mirrored reference, resulting in [1] in release builds. Obvious issue, but, as my example never uses multiple references with overlapping lifetimes of any form, the issue would not be fixed by any form of more proper reference exposing; it's just simply the general issue of referencing to the same data in multiple ways.

vmap afaict only exposes push-back and pop-front for mutation, so unfortunately I think the distance to cross to achieve spooky action in practice is too far (need to do a whole lap around the buffer to write to the same byte twice; and critical methods aren't inlined so nothing to get the optimizer to mess with), but it still should technically be UB.

slice_deque has many open issues about unsoundness. magic-ring-buffer doesn't build on modern rust.

[1]: https://dzaima.github.io/paste/#0TVDBTsQgFLz3K56XbptsWlo1MWz...

2 comments

> Rust doesn't help here; you necessarily must do all stores in potentially-mirrored memory as volatile (and possibly loads too), else you can have arbitrary spooky-action-at-a-distance issues, as, regardless of &[T] vs &mut [T] or whatever language-level aliasing features, if the compiler can see that two addresses are different (which they "definitely" are if the compiler, for one reason or another, knows that they're exactly 4096 bytes apart) it can arbitrarily reorder them, messing your ring buffer up.

Hmm, as I think about it, I see your point about LLVM's optimizer potentially "knowing" memory hasn't changed that really has if it inlines enough even if it's never put into the same &mut [T] as the other side of the mirror (and two improperly aliased &mut [T] are never constructed).

But as an alternative to doing all the stores in a special way (and loads...don't see how doing a volatile store to one side of the mirror is even sufficient to tell it the other side of the mirror has changed)...it'd be far more practical if the caller could use a (not mirrored) &mut [T]. Couldn't you have an std::ops::IndexMut wrapper that returns a guard that has a DerefMut into &mut [T] and on Drop creates a barrier for these kinds of optimizations via `std::arch::asm!("")`? [1] Then LLVM has to assume all memory changed in that barrier.

Regarding the more specific crate issues: I found these crates a while ago and hadn't looked extensively in their implementation. Thanks for pointing these out; I will have to look more closely if/when I ever decide to actually use this approach. I was leaning toward no anyway because of the other factors I mentioned. As an alternative, I was thinking of having a ring buffer + a little extra bit at the end that is explicitly copied from the start as needed. The maximum length of one message I need a contiguous view of is far less than the total buffer size, so only a fraction of the buffer would need to be copied.

> vmcircbuf just exposes the mutable mirrored reference, resulting in [1] in release builds.

Yuck, noted, clearly wrong to give the whole thing as a `&mut [T]`.

> slice_deque has many open issues about unsoundness.

I see at least couple of those, which seem to be "just" the usual unsafe-done-wrong sorts of things (double frees) rather than anything inherent to the mirrored buffer.

[1] https://stackoverflow.com/questions/72823056/how-to-build-a-...

Yeah, an asm marked as memory-clobbering is the proper thing; not the first time I've forgotten that volatile entirely doesn't imply anything to other memory. (in fact, doing "((volatile uint8_t*)x)[0] = 0xaa;" in my godbolt link in a sibling thread still has the optimization happen). Don't know how exactly it interacts with aliasing rules; maybe you'd have to explicitly pass the mutable reference to the asm as an input, otherwise it'd be illegal for the asm to change it and so the compiler can still assume it isn't? or, I guess, not have any references live during the asm call is the proper thing.

Probably indeed possible to do it with proper guards (the pre-pooping your pants issue is probably not a problem if you also have the asm guard in drop?).

> I see at least couple of those, which seem to be "just" the usual unsafe-done-wrong sorts of things (double frees) rather than anything inherent to the mirrored buffer.

Yeah, possible. I was just saying that from the perspective of proving that all the ring buffers not taking extreme care are incorrectly implemented.

> Don't know how exactly it interacts with aliasing rules; maybe you'd have to explicitly pass the mutable reference to the asm as an input, otherwise it'd be illegal for the asm to change it and so the compiler can still assume it isn't? or, I guess, not have any references live during the asm call is the proper thing.

I don't know either, but really it's the opposite half of the buffer you want to tell it may have changed, so I imagine it doesn't matter even if you still have the `&mut [T]` live.

Maybe the extra guard I described isn't necessary either; the DerefMut could directly return `&mut [T]` but set a `barrier_before_next_access` on the ring, or you could just always have the barrier, whatever performs best I guess.

>so unfortunately

I see a fellow enjoyer of bugs ;)

>vmap afaict only exposes push-back and pop-front for mutation

what about https://doc.rust-lang.org/nightly/std/io/trait.Write.html#ty... ?

>and critical methods aren't inlined

aren't inlined explicitly. This does not mean that they are not inlined in practice (depending on build options). Also, LLVM can look inside a noinline available method body for alias analysis :(

This is a big pain whenever one wants to do formally-UB shennenigans. I'm not a rustacean, but in julia a @noinline directive will simply tell LLVM not to inline, but won't hide the method body from LLVM's alias analysis. For that, one needs to do something similar to dynamic linking, with the implied performance impact (the equivalent of non-LTO static linking doesn't exist in julia).

> I see a fellow enjoyer of bugs ;)

Yep! :)

I did look at the assembly on a release build and the write method was in fact not inlined (needed to get the compiler to reason about the offset aliasing); that write method is what I called "push-back" there. I could've modified the crate to force-inline, but that's, like, effort, just to make a trivially-true assertion for one HN post.

Indeed a lack of an equivalent of gcc's __attribute__((noipa)) is rather annoying with clang (there are like at least 4 issues and 2 lengthy discussions around llvm, plus one person a week ago having asked about it in the llvm discord, but so far nothing has happened); another obvious problem being trying to do benchmarking.

(for reference, what I was trying to get to happen was an equivalent of https://godbolt.org/z/jobs6M95G)