Hacker News new | ask | show | jobs
by omginternets 1281 days ago
What are monomorphic data types? What should be my first read on the subject?
1 comments

It's a fancy way of saying "every time this type is used, replace all the generic type params with what was used and generate code for it". It's how generics are implemented in Rust. If you have

    struct Foo<T>(T);
And you create Foo(42i32) and Foo(0.0f64), the compiler will create the equivalent to

    struct Fooi32(i32);
    struct Foof64(f64);
In other languages like Java, generics are implemented the way that Rust does "trait objects" (&dyn Trait).

Rust is not the only language that does this, to be clear.

If you're interested in a quick intro on the compiler side of this, you can read https://rustc-dev-guide.rust-lang.org/backend/monomorph.html

Expanding on trait objects: these are implemented as "V-Tables", structs holding pointers to the trait's methods and to the underlying type. This means that if you need to know what the underlying type, you have to do something fancy, usually referred to as "reflection". Also, invocation of generic functions that use V-Tables require "chasing pointers", which makes cache locality worse (because data might not be in the same cache read as the v-table itself), but makes the generated binary smaller (because if you have something like Foo<T> used with 1000 types, with monomorphization you end up with 2000 generated types in the binary, instead of 1001 with trait objects).
To add to this, even the Foo-wrapper is gone, just the i32 remains. Rust values are amorphous data blobs at runtime.
Yes, that's true but that is an implementation detail that only comes into play when dealing with ABI, and then you should be using #[repr(transparent)] to ensure that the compiler won't do something else :)
Sure, it’s good to point out the difference between “the behavior of a typical optimizing compiler” and “things actually guaranteed by the language”. The context of the discussion was the former, I think. I’m not even that certain that monomorphization is actually required in theory.
Yes, monomorphization isn't needed in theory, as long as the user-visible behavior remains the same, and in practice the team is exploring options[1] to identify cases where the currently manual practice of writing

    pub fn foo<T: AsRef<X>>(x: T) {
        inner_foo(x.as_ref());
    }
    fn inner_foo(_: &X) { todo!() }
can be instead done by the compiler automatically (turning monomorphized code back into polymorphic code, hence the polimorphization hame).

[1]: https://rustc-dev-guide.rust-lang.org/backend/monomorph.html...

ABI wise that is not true though. structs have struct ABI, even just a newtype struct around an integer will not use integer ABI unless annotated with #[repr(transparent)].
Pretty sure that some usage patterns of polymorphic types can not be completely monomorphized. Here's example in Golang:

    package main

    import (
        "fmt"
    )

    type wrapper[T any] struct {
        Value T
    }

    func (w wrapper[T]) String() string {
        return fmt.Sprintf("{%v}", w.Value)
    }

    func stringWrapped[T any](n int, v T) string {
        if n == 0 {
            return fmt.Sprintf("%v", v)
        }
        return stringWrapped(n-1, wrapper[T]{Value: v})
    }

    func main() {
        n := 0
        fmt.Scanf("%d", &n)
        result := stringWrapped(n, "test")
        fmt.Println(result)
    }
Go refuses to compile because it can't possibly generate all instances of wrapper[T] that this program may use: wrapper[string], wrapper[wrapper[string]], wrapper[wrapper[wrapper[string]]], etc.
Rust will complain about a recursion limit being reached during instantiation[1]. The solution in Rust is to use &dyn Trait or Box<dyn Trait> instead.[2]

[1]: https://play.rust-lang.org/?version=stable&mode=debug&editio...

[2]: https://play.rust-lang.org/?version=stable&mode=debug&editio...

^ This blows the stack because it keeps calling itself with no break condition, but shows how the type system accepted the code.

I think this is called polymorphic recursion in Haskell circles.

In C++ you can monomorphize as long as you can somehow prove the recursion terminates at compile time (for example by threading a static recursion counter).

Nice examples - you can also have languages (like SML) where monomorphization is simply an implementation detail. Some compilers (e.g., MLton) perform monomorphization and others don't.
I only recently realized that certain type system features, like polymorphic recursion, make monomorphization impossible in the general case. In Haskell for example, it’s by necessity only an optimization that’s used where applicable, and not the general strategy.
That depends on what you mean. SML has "polymorphism" boiling down to being able to plug an arbitrary type in some places, which is denoted like 'a. But when people talk about generics, they more often talk about C++ templates, Java generics, Rust traits, etc. whose SML equivalent are signatures, structs and functors. Signatures are a bit like Rust traits, structs are a bit like Rust implementations of traits, whereas functors are like Rust's "templates", i.e. wherever you swap angle brackets to parametrise something with types constrained by traits, or values constrained by types. Except in Rust this parametrisation can be slapped on a bunch of things. It can be on structs, on functions, on traits, on implementations of traits etc. In SML you need to group all the "parametrised" things into a struct (and a corresponding signature), which is going to be returned by a functor.

And now the thing is: with transparent signature ascriptions, functors are monomorphised in SML, instead of everything being hidden behind signatures (as is in the case of Rust with traits when you use dyn), which has semantic consequences. E.g. a struct returned by a functor may contain a type. You can't perform proper type-checking without monomorphising, because you don't know what the exact type is. E.g. in the following program, the final line couldn't be type-checked without monomorphisation:

   signature ITERABLE = sig
       type ElemT
       type SrcT
   
       val new_iter: SrcT -> unit -> ElemT option
   end
   
   signature LIST_ELEM_TYPE = sig
       type T
   end
   
   functor ListIterFun (ListElemType: LIST_ELEM_TYPE): ITERABLE = struct
       type ElemT = ListElemType.T
       type SrcT = ElemT list
   
       fun new_iter l = let val lr = ref l
                        in
                          fn () => case !lr of
                                     nil => NONE
                                   | (x::xs) => (lr := xs; SOME x)
                        end
   
   end
   
   structure IntElemType: LIST_ELEM_TYPE = struct
       type T = int
   end
   
   structure IntListIter = ListIterFun(IntElemType)
   
   val next = IntListIter.new_iter [1, 2, 3, 4, 5]
If I change the signature ascription on ListIterFun to an opaque ascription (:> ITERABLE), the final line won't type-check, because it's not obvious from the signature, that ElemT is int. So transparent signature ascriptions require monomorphisation (Rust traits without dyn), and opaque signature ascriptions free the compiler from having to do monomorphisation (Rust traits with dyn*).

There was a lot of discussion of this issue when Go was settling on a design for its generics, under the phrase "reified generics".

Not exactly the same thing but JITs can turn dynamic objects into structs if the structure is consistent. JS runtimes and Julia do this as far as I know.
Firefox's js runtime also do tricks like generate multi copy of optimized function when the function has multi call site instead make one with lots of if else. So it no longer suffer from the problem that function that frequently get multi different type of parameters from different call site has poor performance.

It's probably exactly how templates work, except the details are invisible to users.

https://hacks.mozilla.org/2020/11/warp-improved-js-performan...

Yes! Java as well. And this is how those languages can show impressive benchmarks for consistent workloads. In theory they can even surpass AoT languages. In practice it depends on the specifics.
Julia doesn't do this. It just has structs in the first place.
I think cpp does this too
It indeed does. The only difference is that Rust has traits (similar to C++'s concepts) which require explicit mention of what interface the type parameters have inside the function, whereas C++'s templates will have a compile error after instantiation if you passed something that didn't meet the expected contract. This is closer to Rust's macros in operation.

Given

    fn foo<T>(a: T, b: T) -> T { a + b }
The compiler will complain that you should have been explicit on how T is going to be used:

    error[E0369]: cannot add `T` to `T`
     --> src/lib.rs:1:32
      |
    1 | fn foo<T>(a: T, b: T) -> T { a + b }
      |                              - ^ - T
      |                              |
      |                              T
      |
    help: consider restricting type parameter `T`
      |
    1 | fn foo<T: Add<Output = T>>(a: T, b: T) -> T { a + b }
      |         +++++++++++++++++
whereas in C++ this would have been accepted until you called foo with two things that couldn't be added together, like a Rust macro[1].

[1]: https://play.rust-lang.org/?version=nightly&mode=debug&editi...