Hacker News new | ask | show | jobs
by pwdisswordfish2 2170 days ago
> I'm not sure I understand this. Does it not produce a run time error? Why not?

It’s not supposed to. Type casting with ‘as’ is supposed to be lightweight and always succeed; there is no room in the type system to return an error. In case lossless casting is not possible, some value still has to be returned. Until now, this was outright UB — meaning the compiler is not even obligated to keep it consistent from one build to another. Saturating, while still not optimal, is at least deterministic.

> This looks very dangerous, because it essentially does the "nearest to right" thing.

That’s why the intention is to introduce more robust approximate conversion functions and eventually probably deprecate ‘as’ casts altogether. There has been a number of discussions about this; current disagreements seem to be about how to handle the various possible rounding modes.

4 comments

> meaning the compiler is not even obligated to keep it consistent from one build to another.

Way worse than that. The compiler wasn't obligated to act like anything at all. It would be totally legal to compile it so that the first time the value was accessed you got 0, the next time you got 1 - within the same program execution, with no mutation of the value. That is the sort of thing that is observed behavior of UB in the worst cases, and why it's so terrible to just pretend that UB is innocuous.

Way worse than that, even. UB poisons every state of the program that eventually results in UB. For example, the optimizer is well within its rights to remove as dead code any branch that, if taken, would provably lead to UB at some arbitrary future point of execution.
That could literally produce no output program?
> That could literally produce no output program?

Way worse than even that (you might be noticing a theme here...). Once the optimizer has removed as dead code any branch that, if taken, would provably lead to UB at some arbitrary future point of execution, it can conclude that the other branch is now the only possible execution, and call it unconditionally, even if that leads to removing all your files (the classic example is https://kristerw.blogspot.com/2017/09/why-undefined-behavior...).

Yep! Dumb example.

    main()
      x = get_from_some_external_data_source()
      if x:
        print("Hello World")
        trigger_ub()
You might expect this code to always print if x is true but the optimizer can look at this and say "welp, if x is true then it would trigger ub, therefore it must be false, and since x must always be false we can just remove that entire branch."
My favorite example along these lines (in C) is "Cap'n'Proto remote vuln: pointer overflow check optimized away by compiler"[1] which was covered here a few years back and shows all of these "theoretical" compiler behaviors coming to a head in a real bug which is thoroughly explained.

1: https://news.ycombinator.com/item?id=14163111

c.f. “nasal demons”
As it was UB anyway, wouldn’t the compiler also be within its rights to detect the problem and panic the thread?
Yes, it would. However, it rarely does because of how the incentives are: it's useful for optimization to pretend that UB never happens, and more optimization leads to faster binaries, which are what the compiler engineers often pursue.
Right, but given that panicing was already allowable behavior, why was the new behavior chosen to be one likely to introduce subtle bugs? It seems much better to loudly proclaim the existence of an erroneous precondition, which is consistent with how things like array indexing behave.

I guess I’ll have to go dig up the RFC discussion on this one; it should make for interesting reading.

1. Panicing is not in line with how `as` casts are supposed to act. (e.g. `u32value as u8` does not panic but just takes the "lower" one byte.)

2. This might (I haven't profiled it) introduce performance regressions in ways which should not happen.

3. Besides in some usages around `dyn` other usages of `as` get increasingly more alternatives. It's just a question of time until `as` (for int/float casts) is recommended to not be used at all, maybe even linted against.

4. Given precedence of many other programming languages people don't expect a "simple" float to int cast to be failable. (The new methods replacing `as` make the fallibility clear, as it's e.g. `u64::try_from(bigf64)`).

5. It's udef-ness is only detected/handled in llvm, _I don't know_ if llvm provides similar well integrated mechanisms for this as it does for integer overflows. If not that would be another problem.

I came here with same question as davrosthedalek "Why doesn't it just panic?" and yours is an excellent and very convincing answer. Thanks for writing it up.
> Panicing is not in line with how `as` casts are supposed to act. (e.g. `u32value as u8` does not panic but just takes the "lower" one byte.)

So, instead of this being traditional UB, it was a combination of two separate issues:

- Rustc erroneously emitting code that exercised an LLVM UB case, and

- Imprecise Rust documentation around the exact behavior of float -> int ‘as’ casts

That phrasing seemingly implies that there was a generally-agreed intended behaviour for that case that was unfortunately neither documented nor actually exhibited. But that was not the case. The overflow behaviour of float-to-int casts was undefined, full stop. There was some consensus as to what such casts should not do, but no agreement on what the actual behaviour should be. Eventually, the discussion settled on defining casting overflow to saturate, and the implementation was altered to match.
> Right, but given that panicing was already allowable behavior, why was the new behavior chosen to be one likely to introduce subtle bugs?

Because casting to floats is not UB in the Rust spec, it's UB in LLVM. That's the whole reason this was an issue in the first place.

Now, Rust could have chosen to define the behavior to panic, but so far it's been a hard and fast rule in Rust that as casts do not panic. You would have to have a much better reason to change that then "well, it was UB before" since (1) nobody wanted it to be UB before, and (2) the actual implementation never panicked (and people absolutely rely on the fact that casts don't panic in unsafe code).

I thing you are right about that, I'd prefer panicking too. Also, reading the RFC will surely clear up the motivation.

Without knowing this case, I'd wager a guess: it's about performance. Panicking introduces a branch and side-effects which, again, affects negstively optimization potential and performance. The saturating cast affects performance too, but less. If some old code has a lot of number crunching containing these operations, a big performance regression would be nasty.

I think the long-term plan is to deprecate `as` entirely (probably in a future edition). You will then be free to pick between a function that panics, one that gives you a Result, or that saturates, etc. I believe most if not all of these functions already exist.
It looks like ‘as’ is generally defined as a truncating cast operator for other numeric types. Since it doesn’t panic for other overflow situations (like u64->u32), they chose consistency with those cases.

https://github.com/rust-lang/rust/issues/10184#issuecomment-...

It is't about performance because array indexing also panics.
Compilers are better at eliding array index checks than about eliding numeric overflow checks; an ordinary function that operates on arrays will often be able to reasonably determine both the length of the array and the scope of the indexing variable (Rust's iterators are very good at this), whereas an ordinary math function often has almost no information that usefully limits which values it might be getting called with.

I don't disagree that making more obvious errors into panics would be nice, but the performance implications are often quite unforgiving.

Seems a pity they didn't add a built-in at the same time that's a bit more nuanced. It could maybe return an enum with Success(value), Underflow, Overflow and NaN. That way it's up to the coder to decide whether they want a saturating cast or to check the result explicitly.
There are conversations about doing exactly that with `.try_into()` doing exactly that.
> there is no room in the type system to return an error

There is: you panic, like in the array case.

That would have been much robust (at the cost of performance).

For consistency you would also expect `257u32 as u8` to panic (which it doesn't, and never has); the `as` operator has always been about fast-and-lossy conversions, with the standard library ideally providing methods for more principled conversions.