Hacker News new | ask | show | jobs
by davrosthedalek 2165 days ago
I'm not sure I understand this. Does it not produce a run time error? Why not?

This looks very dangerous, because it essentially does the "nearest to right" thing. Say, you cast 256 to a u8, it's then saturated to 255. That's almost right, and a result might be wrong only by 0.5%. Much harder to detect than if it is set to 0.

3 comments

> I'm not sure I understand this. Does it not produce a run time error? Why not?

It’s not supposed to. Type casting with ‘as’ is supposed to be lightweight and always succeed; there is no room in the type system to return an error. In case lossless casting is not possible, some value still has to be returned. Until now, this was outright UB — meaning the compiler is not even obligated to keep it consistent from one build to another. Saturating, while still not optimal, is at least deterministic.

> This looks very dangerous, because it essentially does the "nearest to right" thing.

That’s why the intention is to introduce more robust approximate conversion functions and eventually probably deprecate ‘as’ casts altogether. There has been a number of discussions about this; current disagreements seem to be about how to handle the various possible rounding modes.

> meaning the compiler is not even obligated to keep it consistent from one build to another.

Way worse than that. The compiler wasn't obligated to act like anything at all. It would be totally legal to compile it so that the first time the value was accessed you got 0, the next time you got 1 - within the same program execution, with no mutation of the value. That is the sort of thing that is observed behavior of UB in the worst cases, and why it's so terrible to just pretend that UB is innocuous.

Way worse than that, even. UB poisons every state of the program that eventually results in UB. For example, the optimizer is well within its rights to remove as dead code any branch that, if taken, would provably lead to UB at some arbitrary future point of execution.
That could literally produce no output program?
> That could literally produce no output program?

Way worse than even that (you might be noticing a theme here...). Once the optimizer has removed as dead code any branch that, if taken, would provably lead to UB at some arbitrary future point of execution, it can conclude that the other branch is now the only possible execution, and call it unconditionally, even if that leads to removing all your files (the classic example is https://kristerw.blogspot.com/2017/09/why-undefined-behavior...).

Yep! Dumb example.

    main()
      x = get_from_some_external_data_source()
      if x:
        print("Hello World")
        trigger_ub()
You might expect this code to always print if x is true but the optimizer can look at this and say "welp, if x is true then it would trigger ub, therefore it must be false, and since x must always be false we can just remove that entire branch."
My favorite example along these lines (in C) is "Cap'n'Proto remote vuln: pointer overflow check optimized away by compiler"[1] which was covered here a few years back and shows all of these "theoretical" compiler behaviors coming to a head in a real bug which is thoroughly explained.

1: https://news.ycombinator.com/item?id=14163111

c.f. “nasal demons”
As it was UB anyway, wouldn’t the compiler also be within its rights to detect the problem and panic the thread?
Yes, it would. However, it rarely does because of how the incentives are: it's useful for optimization to pretend that UB never happens, and more optimization leads to faster binaries, which are what the compiler engineers often pursue.
Right, but given that panicing was already allowable behavior, why was the new behavior chosen to be one likely to introduce subtle bugs? It seems much better to loudly proclaim the existence of an erroneous precondition, which is consistent with how things like array indexing behave.

I guess I’ll have to go dig up the RFC discussion on this one; it should make for interesting reading.

1. Panicing is not in line with how `as` casts are supposed to act. (e.g. `u32value as u8` does not panic but just takes the "lower" one byte.)

2. This might (I haven't profiled it) introduce performance regressions in ways which should not happen.

3. Besides in some usages around `dyn` other usages of `as` get increasingly more alternatives. It's just a question of time until `as` (for int/float casts) is recommended to not be used at all, maybe even linted against.

4. Given precedence of many other programming languages people don't expect a "simple" float to int cast to be failable. (The new methods replacing `as` make the fallibility clear, as it's e.g. `u64::try_from(bigf64)`).

5. It's udef-ness is only detected/handled in llvm, _I don't know_ if llvm provides similar well integrated mechanisms for this as it does for integer overflows. If not that would be another problem.

I came here with same question as davrosthedalek "Why doesn't it just panic?" and yours is an excellent and very convincing answer. Thanks for writing it up.
> Panicing is not in line with how `as` casts are supposed to act. (e.g. `u32value as u8` does not panic but just takes the "lower" one byte.)

So, instead of this being traditional UB, it was a combination of two separate issues:

- Rustc erroneously emitting code that exercised an LLVM UB case, and

- Imprecise Rust documentation around the exact behavior of float -> int ‘as’ casts

> Right, but given that panicing was already allowable behavior, why was the new behavior chosen to be one likely to introduce subtle bugs?

Because casting to floats is not UB in the Rust spec, it's UB in LLVM. That's the whole reason this was an issue in the first place.

Now, Rust could have chosen to define the behavior to panic, but so far it's been a hard and fast rule in Rust that as casts do not panic. You would have to have a much better reason to change that then "well, it was UB before" since (1) nobody wanted it to be UB before, and (2) the actual implementation never panicked (and people absolutely rely on the fact that casts don't panic in unsafe code).

I thing you are right about that, I'd prefer panicking too. Also, reading the RFC will surely clear up the motivation.

Without knowing this case, I'd wager a guess: it's about performance. Panicking introduces a branch and side-effects which, again, affects negstively optimization potential and performance. The saturating cast affects performance too, but less. If some old code has a lot of number crunching containing these operations, a big performance regression would be nasty.

I think the long-term plan is to deprecate `as` entirely (probably in a future edition). You will then be free to pick between a function that panics, one that gives you a Result, or that saturates, etc. I believe most if not all of these functions already exist.
It looks like ‘as’ is generally defined as a truncating cast operator for other numeric types. Since it doesn’t panic for other overflow situations (like u64->u32), they chose consistency with those cases.

https://github.com/rust-lang/rust/issues/10184#issuecomment-...

It is't about performance because array indexing also panics.
Seems a pity they didn't add a built-in at the same time that's a bit more nuanced. It could maybe return an enum with Success(value), Underflow, Overflow and NaN. That way it's up to the coder to decide whether they want a saturating cast or to check the result explicitly.
There are conversations about doing exactly that with `.try_into()` doing exactly that.
> there is no room in the type system to return an error

There is: you panic, like in the array case.

That would have been much robust (at the cost of performance).

For consistency you would also expect `257u32 as u8` to panic (which it doesn't, and never has); the `as` operator has always been about fast-and-lossy conversions, with the standard library ideally providing methods for more principled conversions.
> Why not?

For better or worse, Rust 1.0 released with the philosophy that the `as` operator is for "fast and loose" conversions where accuracy is not prioritized; e.g. casting a u32 to a u8 would always risk silently truncating in the event the value was too large to represent. Over the years the language has added a lot of standard library support for bypassing the `as` operator entirely, and I think the prevailing opinion at this point might be that if they had the to do it all over again they might not have had the `as` operator at all, instead making do with a combination of ordinary error-checked conversion methods and the YOLO unsafe unchecked methods as seen here.

Which is to say: it's not that they technically couldn't have gone with the panic approach, but (performance implications aside) I think they'd rather just start moving away from `as` in general wherever possible.

It seems to me (FWIW as someone curious abt Rust but not seriously using it yet), that "as" should require a warning label. If "unsafe" means something else, then "unsound" could be used. If the types are always safe to convert, "as" is fine, but if the types involved allow for a lossy conversion in any case, the compiler would require you to write "unsound as". You are both acknowledging that you are aware of the issue and notifying future users of that code of a potential problem. Rust could go ahead with the saturating conversion to remove the UB AND require the "unsound" warning label.

If required to an an "unsound" label, that might be enough push for some developers to add their own explicit "if" guards or use another conversion method without having to deprecate "as".

"unsound" has a particular definition that is separate from what people have issue with here; in Rust parlance, there's nothing unsound about the `as` operator choosing to saturate or overflow etc., because that has no ability to break the guarantees provided by the type system.

As for requiring an additional speedbump (like e.g. a "lossy" label) here to guard against misuse, I think this proposal is overlooking something: Rust can't just abruptly break all code that currently uses `as` in order to demand that something like `lossy as` be used instead. Any removal would have to first have a very long period where `lossy as` is syntactically valid and where the compiler instead warns for people using raw `as`. But if the compiler is already emitting a mere warning for `as` that suggests a better alternative, then it could just as easily suggest a method like `.try_into()`, which exists today. And once you're having the compiler warn about changing `as` into something else, that's already indistinguishable from deprecating `as` in those instances, so there's no point trying to avoid it.

Should `as` then be a candidate for deprecation in the next edition?
I don't know about the next edition; `as` is also used for a small number of non-numeric conversions, like turning a reference into a raw pointer or for creating a trait object. In order to completely remove `as`, you would need to first introduce alternatives to those (the trait object use case specifically might be a bit weird as a method rather than a keyword, but I don't know if that's a big problem).

Alternatively one could just deprecate `as` for numeric conversions, although there are still some library holes that would need to be patched up (e.g. other than `as` I don't know of an existing single method to say that you want a fast integer conversion routine that merely truncates a u64 into a u32; right now the alternative is `try_into`, which does a runtime check and returns a Result). It might also exacerbate tensions with some people who for quite a while have been grumpy that Rust requires frequent explicit numeric conversions when doing things like indexing (which always requires a `usize` type, so you see `foo[bar as usize]` unless people want to always work with usizes directly); would these people be happier if that were `foo[bar.as_usize()]` instead, or would this all be blocked on a discussion about being more lenient with numeric conversions?

Most floating points aren't integers anyway, so if you think of it as casting 256 to the nearest u8 it's correct, same as rounding 0.25 to 0.

NaN to 0 is a bit more concerning, but inconvienence and compatibility need to be weighed against catching every error (no type system will catch every bug).