Hacker News new | ask | show | jobs
by VorpalWay 93 days ago
Very interesting discussion. I hadn't realised that the final provenance model hadn't yet been decided for C and C++.

Angelic non-determinism seems difficult to use to determine if an optimisation is valid. If I understand this correctly, it is basically the as-if rule, but in this case applied to something that potentially needs global program analysis. Would that be an accurate understanding?

It sounds like both of these proposals will be strictly less able to optimize than strict provenance in rust to me. In particular, Rust allows applying a closure/lambda to map a pointer while keeping the provenance. That avoids exposing the provenance as you add and remove tag bits, which should at least in theory allow LLVM to optimise better. (But this keeps the value as a pointer, and having a dangling pointer that you don't access is fine in Rust, probably not in C?)

I'm not sure why I'm surprised actually, Rust can be a more sensible language in places thanks to hindsight. We see this in being able to use LLVM noalias (restrict basically) in more places thanks to the different aliasing model, while still not having the error prone TBAA of C and C++. And it doesn't need a model of memory consisting of typed objects (rather it is all just bytes and gets materialised into a value of a type on access).

1 comments

No, angelic non-determinism is not related to the as-if rule. It essentially says that if there is a choice to assign provenance on backconversion from integers, the one which makes the program valid is assigned. This is basically the same as the explicit UDI rule in TS 6010, except that this is rule is very clear. The problematic with angelic non-determinism is two-fold: a) most people will not be able to reason about it at all, and b) not even formal semantics experts know what it means in complicated cases. Demonic non-determinism essentially means that all possible execution must be valid while angelic non-determinism that there must exist at least one. Formally, this translates to universal and existential quantifiers. But for quantifiers, you must know where and in which order to place them in a formula, which wasn't clear all from the wording I have seen (a while ago). The interaction with concurrency is also a can of worms.

I don't think there is a fundamental advantage to Rust regarding provenance. Yes, we lack a way to do pointer tagging without exposing the provenance in C, but we could easily add this. But this is all moot as long as compilers are still not conforming to the provenance model with respect to integer and pointer casts anyway and this breaks Rust too! Rust having decided something just means they life in fairy tale world, while C/C++ not having decided means they acknowledge the reality that compilers haven't fixed their optimizers. (Even ignoring that "deciding" means entirely different things here anyway with C/C++ having ISO standards.)

> But this is all moot as long as compilers are still not conforming to the provenance model with respect to integer and pointer casts anyway and this breaks Rust too! Rust having decided something just means they life in fairy tale world, while C/C++ not having decided means they acknowledge the reality that compilers haven't fixed their optimizers.

I think this is a bit of a mischaracterization. While there can of course be bugs in LLVM (and rustc and clang), what sort of LLVM IR you generate matters. To be able to generate IR that conforms to the provenance model of the language you first need to have such a model.

As far as I know (and this matches what I found when search the rust issue tracker) there is currently one major known LLVM bug in this area (https://github.com/rust-lang/rust/issues/147538) with partial workarounds applied on the Rust side. There is some issues with open question still, such as how certain unstable features should interact with provenance.

I think calling the current situation "fairy tale world" is a gross exaggeration. Is it perfectly free of bugs? No, but if that is the criteria, then the entirety of any compiler is a fairy tale (possibly with the exception of some formally verified compiler).

I am not sure this is a mischaracterization. The C provenance model also exists, even as a form of an ISO TS. The Rust model copied the basic concepts and even the terminology from us. The reason the C model is is not in ISO C 23 but in a separate TS is because compilers are not able to implement correctly at this time due to bugs. But neither do they implement the Rust model correctly because of the same bugs.

One should also point out that basic provenance is already part of the ISO C standard for a long time (but not under this name). That a precise technical specification is needed is only because the exact details were not clear and there are inconsistencies and differences between and even inside compilers. Rust having a precise model does not make these problems automatically go away just as the ISO TS does not.