Hacker News new | ask | show | jobs
by angrygoat 2169 days ago
This rather niche fixing of unsafe behaviour is excellent: https://blog.rust-lang.org/2020/07/16/Rust-1.45.0.html#fixin...

I spent a few years as a scientific programmer and this is exactly the sort of thing that just bites you on the behind in C/C++/Fortran: the undefined behaviour can actually manifest as noise in your output, or just really hard to track down, intermittent problems. A big win to get rid of it.

8 comments

It would be nice if there were both a saturating-as and a overflow-is-a-bug-as, the later of which is also saturating but in debug builds get instrumentation to panic if it ever actually saturates.

The overflow-is-a-bug-saturating as would be the default, and there would be a separate sat_as for "I know this saturates, it isn't a bug.". ::sigh:: Rust went through this same debate for integers, initially rejecting the argument I'm giving here but switching back to it after silently-defined-integer-overflow concealed severe memory unsafty bugs in the standard library.

Well-defining something like saturation actually reduces the power of static and dynamic program analysis because it can no longer tell if the overflow was a programmer-intended use of saturation or a bug.

Having it undefined was better from a tools perspective, even if worse at runtime, because if a tool could prove that overflow would happen (statically) or that it does happen (dynamically) that would always be a bug, and always be worth bringing to the user's attention.

So now you still get "noise" in your output, but it's the harder to detect noise of consistent saturation, and you've lost the ability to have instrumentation that tells you that you have a flaw in your code.

So I think this is again an example of rust making a decision that hurts program correctness.

`as` is not considered idiomatic in Rust code,. `.into()` for infallible cases and `.try_into()` for fallible cases are preferred in almost all cases (in the case of floats there's still discussion around the correct behavior of `.try_into()` for edge cases).

> So I think this is again an example of rust making a decision that hurts program correctness.

Could you expand on other examples?

It might not be "idiomatic", but it isn't "unsafe", and so it is now providing a guaranteed behavior that... is it a bug? I certainly think saturation should always be considered a bug... like that's even worse to me than truncation as the semantics of truncation are at least interesting and related to mathematical basis of von neumann computing and sometimes extremely useful as a way to interact with memory-oriented systems, while the semantics of saturation are just subtle corruption that is almost useless to "use" for anything valuable (and are inconsistent, unless they are also doing it for integers? like if I cast a floating point to a 64-bit number and then to an 8-bit number do I get a truncated result instead of a saturated one?... I hope not, even though I reject saturation as a reasonable semantic). But if it is defined legal and guaranteed behavior, what should a tool do in this scenario? It can't say "you have a bug here" as maybe you wanted that behavior.
> But if it is defined legal and guaranteed behavior, what should a tool do in this scenario? It can't say "you have a bug here" as maybe you wanted that behavior.

Rust has a linter called Clippy which absolutely does call out things like that. It can do so because there are other mechanisms in the language which provide the same functionality without ambiguity. I believe casting with `as` is linted against by Clippy.

If this is truly a concern that you have for your code, then you can use the clippy tool to lint for any "as" conversions: https://rust-lang.github.io/rust-clippy/master/index.html#as...
"unsafe" has a very specific meaning in the context of Rust[1][2]: particularly letting you perform actions that can potentially cause undefined behavior. Defining `as` as saturating is removing UB. That being said, the reason it is considered unidiomatic is because saturating or wrapping might not be what your intention could be, so other ways of turning one type to another is needed and should be recommended (into and try_into). The compiler should help you in going in that direction[3], and even though rustc itself doesn't warn you away from using `as` casts, clippy can[4][5][6][7][8][9][10].

[1]: https://doc.rust-lang.org/nomicon/what-unsafe-does.html

[2]: https://nora.codes/post/what-is-rusts-unsafe/

[3]: https://play.rust-lang.org/?version=nightly&mode=debug&editi...

[4]: https://rust-lang.github.io/rust-clippy/master/#cast_lossles...

[5]: https://rust-lang.github.io/rust-clippy/master/#cast_possibl...

[6]: https://rust-lang.github.io/rust-clippy/master/#checked_conv...

[7]: https://rust-lang.github.io/rust-clippy/master/#char_lit_as_...

[8]: https://rust-lang.github.io/rust-clippy/master/#cast_sign_lo...

[9]: https://rust-lang.github.io/rust-clippy/master/#cast_precisi...

[10]: https://rust-lang.github.io/rust-clippy/master/#cast_possibl...

It is "removing" undefined behavior by simply defining it to be something no one would ever ever want ;P. If you are willing to just define behavior arbitrarily then I think you will find the vast majority of (but admittedly not all) undefined behaviors can be made "safe".
Nobody wants undefined behaviour. That's always worse than defining it, whatever that definition is. In any case, there are better options for selecting the behaviour you want instead of using `as`.
> `as` is not considered idiomatic in Rust code

Yes it is. It's used all the time. There's a reason as_conversions defaults to Allow in clippy.

Of course, there are lots of situations where `as` is the wrong tool for the job, but I think it's a bit of a stretch to call `as` "not idiomatic". It's perfectly idiomatic in lots of situations.

Whether or not it should be idiomatic is a separate question.

I think the parent poster is jumping the gun, but it is true that the flaws in `as` have been acknowledged for a long time and alternatives for specific use cases have been regularly introduced in order to reduce its use where possible. Giving it dedicated syntax might be one of the bigger warts of Rust 1.0: it may be the most convenient way to cast between primitive types, but it's often not the best way.

IOW, is it currently unidiomatic? No, code reviewers won't generally look sideways at it. But it's certainly getting less idiomatic over time, and that trend doesn't seem likely to stop soon.

Well or being a temporal throwback instead of gun jumping. The initial handling of integer overflow was another such decision in my view, but it was ultimately reversed.
Your claim that “as” being used is incompatible with it being unidiomatic is false.

The docs, the reference, and the RFCs all regards “as” as a mistake that should not be used, and _many_ features have been introduced over the years to reduce the cases in which “as” is the only alternative.

There are cases in which there is currently no alternative, so people “must” use “as”. This does not imply that it is idiomatic to do so.

But clippy will also scream at you if you convert a larger type to a smaller one.
Agree, a lot of people focus on Rust and memory safety vs C/C++, but the lack (nearly) of undefined behavior is at least as important/helpful.
Undefined behavior is memory safety; at least from Rust POV. A program with UB is not memory safe because the compiler makes no guarantees about the behaviors of those programs.
I'm not sure I understand this. Does it not produce a run time error? Why not?

This looks very dangerous, because it essentially does the "nearest to right" thing. Say, you cast 256 to a u8, it's then saturated to 255. That's almost right, and a result might be wrong only by 0.5%. Much harder to detect than if it is set to 0.

> I'm not sure I understand this. Does it not produce a run time error? Why not?

It’s not supposed to. Type casting with ‘as’ is supposed to be lightweight and always succeed; there is no room in the type system to return an error. In case lossless casting is not possible, some value still has to be returned. Until now, this was outright UB — meaning the compiler is not even obligated to keep it consistent from one build to another. Saturating, while still not optimal, is at least deterministic.

> This looks very dangerous, because it essentially does the "nearest to right" thing.

That’s why the intention is to introduce more robust approximate conversion functions and eventually probably deprecate ‘as’ casts altogether. There has been a number of discussions about this; current disagreements seem to be about how to handle the various possible rounding modes.

> meaning the compiler is not even obligated to keep it consistent from one build to another.

Way worse than that. The compiler wasn't obligated to act like anything at all. It would be totally legal to compile it so that the first time the value was accessed you got 0, the next time you got 1 - within the same program execution, with no mutation of the value. That is the sort of thing that is observed behavior of UB in the worst cases, and why it's so terrible to just pretend that UB is innocuous.

Way worse than that, even. UB poisons every state of the program that eventually results in UB. For example, the optimizer is well within its rights to remove as dead code any branch that, if taken, would provably lead to UB at some arbitrary future point of execution.
That could literally produce no output program?
> That could literally produce no output program?

Way worse than even that (you might be noticing a theme here...). Once the optimizer has removed as dead code any branch that, if taken, would provably lead to UB at some arbitrary future point of execution, it can conclude that the other branch is now the only possible execution, and call it unconditionally, even if that leads to removing all your files (the classic example is https://kristerw.blogspot.com/2017/09/why-undefined-behavior...).

Yep! Dumb example.

    main()
      x = get_from_some_external_data_source()
      if x:
        print("Hello World")
        trigger_ub()
You might expect this code to always print if x is true but the optimizer can look at this and say "welp, if x is true then it would trigger ub, therefore it must be false, and since x must always be false we can just remove that entire branch."
c.f. “nasal demons”
As it was UB anyway, wouldn’t the compiler also be within its rights to detect the problem and panic the thread?
Yes, it would. However, it rarely does because of how the incentives are: it's useful for optimization to pretend that UB never happens, and more optimization leads to faster binaries, which are what the compiler engineers often pursue.
Right, but given that panicing was already allowable behavior, why was the new behavior chosen to be one likely to introduce subtle bugs? It seems much better to loudly proclaim the existence of an erroneous precondition, which is consistent with how things like array indexing behave.

I guess I’ll have to go dig up the RFC discussion on this one; it should make for interesting reading.

1. Panicing is not in line with how `as` casts are supposed to act. (e.g. `u32value as u8` does not panic but just takes the "lower" one byte.)

2. This might (I haven't profiled it) introduce performance regressions in ways which should not happen.

3. Besides in some usages around `dyn` other usages of `as` get increasingly more alternatives. It's just a question of time until `as` (for int/float casts) is recommended to not be used at all, maybe even linted against.

4. Given precedence of many other programming languages people don't expect a "simple" float to int cast to be failable. (The new methods replacing `as` make the fallibility clear, as it's e.g. `u64::try_from(bigf64)`).

5. It's udef-ness is only detected/handled in llvm, _I don't know_ if llvm provides similar well integrated mechanisms for this as it does for integer overflows. If not that would be another problem.

> Right, but given that panicing was already allowable behavior, why was the new behavior chosen to be one likely to introduce subtle bugs?

Because casting to floats is not UB in the Rust spec, it's UB in LLVM. That's the whole reason this was an issue in the first place.

Now, Rust could have chosen to define the behavior to panic, but so far it's been a hard and fast rule in Rust that as casts do not panic. You would have to have a much better reason to change that then "well, it was UB before" since (1) nobody wanted it to be UB before, and (2) the actual implementation never panicked (and people absolutely rely on the fact that casts don't panic in unsafe code).

I thing you are right about that, I'd prefer panicking too. Also, reading the RFC will surely clear up the motivation.

Without knowing this case, I'd wager a guess: it's about performance. Panicking introduces a branch and side-effects which, again, affects negstively optimization potential and performance. The saturating cast affects performance too, but less. If some old code has a lot of number crunching containing these operations, a big performance regression would be nasty.

Seems a pity they didn't add a built-in at the same time that's a bit more nuanced. It could maybe return an enum with Success(value), Underflow, Overflow and NaN. That way it's up to the coder to decide whether they want a saturating cast or to check the result explicitly.
There are conversations about doing exactly that with `.try_into()` doing exactly that.
> there is no room in the type system to return an error

There is: you panic, like in the array case.

That would have been much robust (at the cost of performance).

For consistency you would also expect `257u32 as u8` to panic (which it doesn't, and never has); the `as` operator has always been about fast-and-lossy conversions, with the standard library ideally providing methods for more principled conversions.
> Why not?

For better or worse, Rust 1.0 released with the philosophy that the `as` operator is for "fast and loose" conversions where accuracy is not prioritized; e.g. casting a u32 to a u8 would always risk silently truncating in the event the value was too large to represent. Over the years the language has added a lot of standard library support for bypassing the `as` operator entirely, and I think the prevailing opinion at this point might be that if they had the to do it all over again they might not have had the `as` operator at all, instead making do with a combination of ordinary error-checked conversion methods and the YOLO unsafe unchecked methods as seen here.

Which is to say: it's not that they technically couldn't have gone with the panic approach, but (performance implications aside) I think they'd rather just start moving away from `as` in general wherever possible.

It seems to me (FWIW as someone curious abt Rust but not seriously using it yet), that "as" should require a warning label. If "unsafe" means something else, then "unsound" could be used. If the types are always safe to convert, "as" is fine, but if the types involved allow for a lossy conversion in any case, the compiler would require you to write "unsound as". You are both acknowledging that you are aware of the issue and notifying future users of that code of a potential problem. Rust could go ahead with the saturating conversion to remove the UB AND require the "unsound" warning label.

If required to an an "unsound" label, that might be enough push for some developers to add their own explicit "if" guards or use another conversion method without having to deprecate "as".

"unsound" has a particular definition that is separate from what people have issue with here; in Rust parlance, there's nothing unsound about the `as` operator choosing to saturate or overflow etc., because that has no ability to break the guarantees provided by the type system.

As for requiring an additional speedbump (like e.g. a "lossy" label) here to guard against misuse, I think this proposal is overlooking something: Rust can't just abruptly break all code that currently uses `as` in order to demand that something like `lossy as` be used instead. Any removal would have to first have a very long period where `lossy as` is syntactically valid and where the compiler instead warns for people using raw `as`. But if the compiler is already emitting a mere warning for `as` that suggests a better alternative, then it could just as easily suggest a method like `.try_into()`, which exists today. And once you're having the compiler warn about changing `as` into something else, that's already indistinguishable from deprecating `as` in those instances, so there's no point trying to avoid it.

Should `as` then be a candidate for deprecation in the next edition?
I don't know about the next edition; `as` is also used for a small number of non-numeric conversions, like turning a reference into a raw pointer or for creating a trait object. In order to completely remove `as`, you would need to first introduce alternatives to those (the trait object use case specifically might be a bit weird as a method rather than a keyword, but I don't know if that's a big problem).

Alternatively one could just deprecate `as` for numeric conversions, although there are still some library holes that would need to be patched up (e.g. other than `as` I don't know of an existing single method to say that you want a fast integer conversion routine that merely truncates a u64 into a u32; right now the alternative is `try_into`, which does a runtime check and returns a Result). It might also exacerbate tensions with some people who for quite a while have been grumpy that Rust requires frequent explicit numeric conversions when doing things like indexing (which always requires a `usize` type, so you see `foo[bar as usize]` unless people want to always work with usizes directly); would these people be happier if that were `foo[bar.as_usize()]` instead, or would this all be blocked on a discussion about being more lenient with numeric conversions?

Most floating points aren't integers anyway, so if you think of it as casting 256 to the nearest u8 it's correct, same as rounding 0.25 to 0.

NaN to 0 is a bit more concerning, but inconvienence and compatibility need to be weighed against catching every error (no type system will catch every bug).

Fortran predictably overflows the result, in contrast to fptoui which gives a poison value. I agree that even a predictable overflow can bite you on the behind. But it's better than undefined behavior. I'm not a fan of saturated cast, but having a defined behavior is for sure a great improvement.
When working with audio as an example, saturation is much less obnoxious than wraparound. You can potentially destroy speakers and your hearing that way.
Wraparound is also the bug that turns Ghandi into a huge dictator in Civilization, and killed a couple of people in cancer treatment devices.
For those not in the know, computer controlled Civs would be assigned behaviour attributes based on historical personality. Each attribute would be 1-10, stored in an 8 bit integer.

These attributes change in response to events in the game. When a civilization would reach the modern era, it's aggression attribute would be reduced by 2. Gandhi's aggression was reduced from 1 to 255 (underflow). He would rain down nukes on everything in sight, somewhat uncharacteristic of him.

Fans liked this so much that Firaxis maintained the same behaviour in later games.

It's a cool story, but I don't think there is evidence that it actually happened (in Civ 1, certainly in later games as a homage to this potentially urban legend). The YouTube channel People Make Games did an investigation[1] about this a year ago, including interviews with the original developers.

[1] https://www.youtube.com/watch?v=Ur3SdgkW8W4

TIL that I am fake news.
Saturated addition is indeed the correct way to add two PCM streams together.

Especially back in the 1980s and 1990s you'd get awful code that did things like averaging the two streams because wrapping sounds awful and the authors were ignorant of the theory and/or unaware that saturated addition is a thing.

You can tell when somebody did this because it means playing silence makes everything else quieter, or worse there's an arbitrary limit on how many streams are played and playing any one thing is very quiet because it's attenuated so as to never wrap.

Haiku the 1990s-style operating system did this for years, as did various Amiga music software.

At least on x86, saturated adds didn't become a thing until MMX was released. Doing saturated adds was simply too expensive on older machines, and usually required a branch.
Why would you want math errors to be less obnoxious?
They're not math errors, it's the most sensible thing to do: when you overload a speaker, you get clipping, not some weird thing where the cone snaps to the opposite side of the range.
I’m talking about the basis of the analogy, not the analogy. If you’re processing audio signals, saturation (i.e. clipping) is of course preferable to the alternative.
I think they just prefer them to be less destructive.
“Destructive”, regardless how minor, should be intentional or at the very least conscious. This is one of the various reasons why serious audio processing is done as 32 bit float: no (practical) risk of unexpected overflows.
> Fortran predictably overflows the result

Not sure what you mean here, and I don't have the standard at hand ATM, but I'm quite sure this is undefined behaviour in Fortran.

But yes, I agree defined behaviour is good. Undefined behaviour is occasionally good for optimization, at the cost of gray hairs for users.

Standard? True, undefined. Compilers, however, happily overflow unless you tell them to catch it at run time. So, my original statement is at least half false. :)
Rust has no specification.
Sort of; we have more than nothing, but not a full thing.

Some things are well specified. Some things are mostly specified. Some things are still very much up in the air.

Do you have specification of a memory model?
It's work in progress, however you may want to take a look at https://rust-lang.github.io/unsafe-code-guidelines/. As far thread safety goes (threads, locks, atomics), memory model matches C++ specification.
I believe it's safe to assume that Rust's memory model is (a subset of) C++11's memory model, since that's what LLVM implements. At this point I don't see how any specification could deviate significantly from that without breaking tons of code.
The Rust memory model is deviating from it a little in order to enable some norestrict-based optimizations that aren't really done for C, even though (as you know) LLVM can't really take advantage of them yet.
I imagine that in C if the compiler can prove that two pointers don't alias than it can elide a load?
Not yet. That falls under the "underspecified, but there is active work ongoing" bit. There's a lot there.
It is nice that they have defined behavior for that now.

Though I try to always scrutinize any floating-point / integer conversions during code reviews. The default casting of a floating point value to integer is frequently not what you want, however. In the code we do, for example, you will usually round to the nearest integer instead, we don't normally need the more fancy rounding schemes.

Out of interest, with (my instance of!) 1.44, "let i = 257.0; i as u8" casts to 1. "let i = usize::MAX as f64 + 1.0; i as usize" does likewise. If you assign i as an integer DIRECTLY, however, it saturates.

To be honest... struggling to see why you'd do the former, outside of situations where you're happy with saturation, though I haven't thought about it a lot. Agreed that a consistent behaviour is a big help - I can work with "Rust does x in this scenario".

Again, UB means the compiler is allowed to do anything at all with code that triggers UB. It’s not just that the value you get is undefined, the behavior of the whole program becomes suspect. Including anything that happens before the UB is actually triggered, not just after that.
In Fortran, there is the ieee_arithmetic intrinsic module that can very nicely and robustly handle undefined behavior. If people do not use it, it is their problems.