Hacker News new | ask | show | jobs
by valenterry 1317 days ago
Too bad Rust doesn't have union types (aka adhoc / anonymous unions) yet.

Without them, using typed errors is very clumsy. Optimally, you would write the following code:

    fn foo(r1: Result<i32, Error1>, r: Result<i32, Error2>) {
        let i1 = r1?;
        let i2 = r2?;
        // ...
    }
and Rust would infer the return type to be Result<String, Error1 | Error2> without having to do any extra definitions or conversions.
7 comments

For anyone interested in what this would look like in Rust now, there's two ways. For libraries, people tend to recommend the thiserror crate. Code sample[0]:

    #[derive(thiserror::Error, Debug)]
    enum Error {
        #[error("One")]
        One(#[from] Error1),
        #[error("Two")]
        Two(#[from] Error2),
    }

    fn foo(r1: Result<i32, Error1>, r2: Result<i32, Error2>) -> Result<..., Error> {
        let i1 = r1?;
        let i2 = r2?;
        // ...
    }
Whereas for binaries, people usually recommend anyhow. Code sample[1]:

    fn foo(r1: Result<i32, Error1>, r2: Result<i32, Error2>) -> anyhow::Result<...> {
        let i1 = r1?;
        let i2 = r2?;
        // ...
    }
[0]: https://play.rust-lang.org/?version=stable&mode=debug&editio... [1]: https://play.rust-lang.org/?version=stable&mode=debug&editio...
The binary vs library thing seems like an oversimplification to me. I think it's more like: do you need callers to handle this error specifically? With a library the answer is "I don't know, better let them do it", so you don't want anyhow. But in a binary, you may or may not, and it depends on the error.

The pattern I use in my app is to use thiserror, and then just have an anyhow catch-all. That lets me do specific stuff where I know I'm going to need specific handling, and an easy-to-use fallback for just saying "this bad thing happened" with the anyhow! macro.

    #[derive(Debug, thiserror::Error)]
    pub enum Error {

        #[error("Not logged in")]
        NotLoggedIn,

        #[error(transparent)]
        Api(#[from] ApiError),

        // etc

        #[error(transparent)]
        Other(#[from] anyhow::Error),
    }
I don't know if this is the best pattern but it's worked really well for me.
For simple tasks you can get away without any external crates by using `Result<T, Box<dyn Error>>`. But it's much more comfortable to use thiserror or anyhow in the long run.

    fn read_string() -> std::io::Result<String> {
        Ok("123".to_owned())
    }

    fn main() -> Result<(), Box<dyn std::error::Error>> {
        let s = read_string()?; // io::Error
        let n = i32::from_str_radix(&s, 10)?; // num::ParseIntError
        println!("read number: {n}");
        Ok(())
    }
Except you often need `Result<T, Box<dyn Error + Send + 'static>>` if you go that route. At the very least, you should create a type alias for it. I very much prefer the use of `anyhow` and/or `thiserror` depending on if I need typed errors.
And the reason it works in anyhow is ... They have those conversions: https://docs.rs/anyhow/latest/anyhow/struct.Error.html#impl-...
what is the TL;DR difference between anyhow and Box<dyn Error> ?
It's detailed here - https://docs.rs/anyhow/1.0.66/anyhow/struct.Error.html - the list is short but IMO significant
Even if the "Ad hoc union" becomes a thing in Rust, you are not likely to get inference of return types.

The return type is part of the function signature and Rust deliberately doesn't infer signatures, in languages with "too much" inference it's impractical for the human programmer to keep track of types because it's all inferred, this has started to be a problem in C++ as more and more things are auto. Rust has some very sophisticated inference inside a function (including partial inference and inferring types from how they're later used), but none for the signature.

While I don't strongly object to Rust's choice here, and I agree that production code should have a type signature on every function, I think this is more a place for lint/clippy/whatever. There's no need to gate the programmer trying something on them having produced a type signature that could be inferred.
You might be interested to know that rustc does have some limited ability to infer return types[1], but there are so many edge cases that the feature as it exists today isn't perfect (and won't let you produce a binary). I believe that 1) type alias = impl Trait; will make prototyping easier, and 2) we should allow -> _ in return types in private functions so that they become an allowed part of the language but critically can't become a semver hazard in crates' APIs.

[1]: https://play.rust-lang.org/?version=stable&mode=debug&editio...

Where do I sign up to fight you on that last one? :D

Allowing _ in return position only for private functions feels like something that ends up surprising 99% of people either that this works, or that it doesn't work for their public function and neither group of people are filled with joy as a result.

It's not a hill I'd die on, but it would go on my list of things I don't like in Rust, along with most as casts, impl AddAssign for String, is_ascii_predicates(/* taking */ &self)

The likelihood of it actually landing is quite low, because I suspect quite a few people would agree with you, and as long as rustfix can deal with it I also don't care as strongly about it ^_^
I think it's the job of the IDE to make that work, but I agree, without IDE this can quickly become a problem.
I think method signatures are part of application/typesystem design and should not be inferred. Explicitly provided types are a feature. Inferred/auto type signatures are "necessariy evil" to reduce boilerplate type declarations around code.

While codeblocks `fn1(fn2(), fn3)` and `var r1 = fn2(); var r2 = fn3; fn1(r1, r2)` are more or less identical, unless you have static type definitions for these methods you start having a very bad time inferring what types are being passed around.

Consider typical python wrapper library with liberal use of *kwargs to pass non-wrapped arguments down to wrappee. Those arguments and their types (as much as they are available in python, you get the idea) are entirely missing from wrapper code and make changes at call site pretty difficult

> unless you have static type definitions for these methods you start having a very bad time inferring what types are being passed around.

It doesn't matter if the types are annotated explicitly or inferred. The amount of information is 100% the same. The IDE could just fill in the types _exactly_ in the same way as they would look like when annotated by hand - maybe just with a different color.

IntelliJ does this quite well, see for instance: https://i.stack.imgur.com/tiqjc.png

This is not a guess. This is the real types. There is literally no difference in the behaviour/semantics of the code.

> Consider typical python wrapper library with liberal

No, because python is not statically typed. You can't compare that.

> It doesn't matter if the types are annotated explicitly or inferred. The amount of information is 100% the same.

Explicit offers the opportunity for narrowing, comments, and sometimes choice of names. Otherwise, yes.

Good points!
I like OCaml's approach, where you write types in module signatures, but don't need to put them in the implementation.
The library ecosystem does fill that gap somewhat with crates like anyhow [1] which also allow you to add context to the errors you return.

1. https://docs.rs/anyhow/latest/anyhow/

Unfortunately this does not solve the problem. If you use this, you lose typesafety and the documentation you get through the (inferred) types. I.e. with union types you can see at a glance in your IDE which types of errors a function can return - this is too valuable to give it up.
You don't really loose type safety, rather it moves to runtime. Downcasting an anyhow error to the concrete internal type is certainly possible and won't panic in my experience.
"moves to runtime" exactly means to lose typesafety though.
I know that this is a common wish, but anonymous sum types have pretty catastrophic impacts on type checking and lead to all sorts of bizarre corner cases like the following:

  let a = if cond {
    1 
  } else {
    1.0
  };
  
  a + 3
Now, the error would be pushed to the `+` operator because there isn't an `Add` for `f32 | u32`. Granted, this is a trivial example, and a programmer can easily see through it, but in general this can get very overwhelming and cause errors to leave their 'root cause'.
Typescript seems to have no problem with it (nor do I suspect F#/OCaml):

  class A {
    x = 10;
    scale(n: number): void {
      this.x *= n;
    }
  }
  class B {
    y = 10;
    scale(n: number): void {
      this.y *= n;
    }
  }  
  var z: A | B;
  z = new A
  z.scale(2);
  z = new B
  z.scale(2);
I was also looking for Sum Types/anonymous enums[0].

[0] https://news.ycombinator.com/threads?id=karmakaze&next=33505...

One option might be to differentiate syntactically between branches of an if/match that are allowed to expand their type (to a union or to a bigger union) and those that are not. I am not sure how far that generalizes, though.
> Now, the error would be pushed to the `+` operator because there isn't an `Add` for `f32 | u32`.

Why not? It makes total sense for one to exist. This is something the language needs to deal with, not the programmer. But the programmer always sprinkle annotations so that errors can only reach so far.

> It makes total sense for one to exist.

I disagree: IME, when you add a float and an integer, you want to cast float to integer 50% of the time, and integer to float the remaining 50% of the time.

Even if it leads to more verbosity, I prefer arithmetic operations to be endomorphisms and use explicit casts.

I'm sure you can just use a linter or configure the compiler to error out in such a case if there are really no use-cases, no?
What do you mean? It already errors out if you try adding them together because it requires explicit casts. What the comment you're replying to is saying is that it's better to just explicitly cast than figure out what the compiler guesses.

    let x=19/10+0.5
I want to decide for myself if my result is 2.4, 1.5, 1, or 2 in the example above
The OP doesn't like that the error moves from the condition to the addition. And what I mean is that then just lint that there shouldn't be unions of number-types or so.
I think the ramifications of `Add` (and friends) for union types is interesting, but I think we can imagine alternatives that are clearly mistakes and so it seems like you're missing the point (... is my guess about the downvotes). Your last sentence makes an important point, though - relying on type inference always lets type errors propagate further than they would if everything was explicitly typed. Adding more annotations constrains that, although whether that would be sufficient is a more complicated discussion that's probably quite sensitive to the particulars of a given language.
Yeah that's true.

But also, for every developer with a couple of month experience, this is not a new problem. It happens all the time, be it missing some parens or a dot or a return / semicolon (depending on the language of course).

Usually it's very easy to hunt it down - you see "a + 1" and you say "wait, a should be an integer - why is it not" and then you go back from there. But in general I agree that it adds some time in such cases that would be resolved quicker with annotations everywhere.

I think that taking options and results usually is an anti pattern. You want to handle the errors at creation-site and not propagate it to another function.

Of course this is just a minimal example... Others have talked about the real point of this comment, and it's just a minor nit. I wanted to point this out for new rust programmers reading this.

Anonymous sum types are something I want for error handling as well. In practice though I'm not sure it would really make my life that much better.
Not sum types. Those are union types. The difference is important, since if you work with two results (or two functions that return results) that use the same error-type you most often don't want to end up with a tuple of two times the same error but simply A<String, Error>.

Of course, if you care about which error is from which function, you can always easily do that by wrapping them into a sumtype, but in practice this is a rather rare use-case in application code at least.

How would that work in a memory safe language? Rust does have (named) untagged unions already (using the `union` keyword), but they are unsafe to use because there is no way to know statically which of the possible variants a given value contains.
You can obviously only call common methods or have to pattern match later and have a way to tell them apart. If you can't tell them apart, the compiler will tell you and you need to tag them somehow.
> You can obviously only call common methods

That sounds like trait objects/dynamic dispatch/`dyn`, which comes with runtime costs.

> or have to pattern match later and have a way to tell them apart.

That "way to tell them apart" is a tag, which would make it a tagged union/enum, not an untagged union. Those already exist, though not in an anonymous flavour.

It does, but the developer decides. Also, errors are often propagated and only matched in exceptional cases, so I don't think the impact would be big.

> That "way to tell them apart" is a tag, which would make it a tagged union/enum, not an untagged union.

No. The difference is that a tagged union (sum type) is defined in advance but generally a union is not necessarily tagged but _can_ be tagged.

Example: say you have tagged union with 3 different types / tags A, B and C. You can now define an adhoc/untagged union that is A | C. That means, we can guarantee at compile time, that we will be able to tell A and C apart later. But it is still not the same, because the combination of A and C was decided adhoc and was not predefined by the developer anywhere necessarily - which is what makes it different from A, B, C which where specifically defined by the developer.

Unions don't have a discriminant. Anonymous Sum types have a discriminant, you just can't name it. Unions in Rust are unsafe because you can't tell what the underlying value will be.
Well, that depends. If the union consists of two types that share the same underlying structure, then obviously at runtime we can never know what the value is.

But otherwise we can. And this is something that we will know at compile-time, so we can prevent runtime-checks that would not work.

I must not be understanding what you're asking for because it sounds like anonymous sum types.
Maybe this is just about terminology.

But essentially, when it comes to union types, they behave like sets. The compiler merges them. (A | B) | (A | B) is the same as A | B. But for sum types (even anonymous ones such as tuples) the compiler can't merge them because that would lose information (if the result is from the first A | B or the second one). Instead, you end up with a nested structure.

Which one is desired depends on the use-case, but it's definitely different.

IIUC this kind of union would discriminate only on type.
It would make life a bit terser, but I’d rather have polymorphic variants. And maybe only anonymous enums over polymorphic variants.

That would make precise error handling on libraries quite a bit better.

„Yet”? Are there any plans to add them?
There are not. I don't believe they've ever been thoroughly proposed.