Hacker News new | ask | show | jobs
by sysk 2245 days ago
Stupid question but could Rust skip monomorphization during development builds and use dynamic dispatch? I heard generics were often to blame for slow builds.
4 comments

It could potentially do that, but there are a lot of things that "normal" monomorphized generic code can do (and does routinely) that would be very very difficult to implement dynamically.

This post goes into a lot of detail on the subject: https://gankra.github.io/blah/swift-abi/

There are perhaps other tradeoffs that could be made instead, like doing more implicit boxing in debug builds, but it's not clear to me that those would be any easier to reconcile with all of the low-level details that Rust exposes. E.g. what happens to `mem::size_of` when an object of a polymorphic type has been boxed?

Related to this area is ongoing "polymorphization" work to detect pieces of generic code that actually do not depend on their type parameters, to avoid duplicating it: https://github.com/rust-lang/rust/pull/69749

This could potentially be made into a much finer-grained analysis to reduce the duplication further, though that may or may not actually help compile times depending on how expensive the analysis is.

In some cases maybe, but I don't think it could work in all cases. One of the things that monomorphization does for you, is statically determine the size of your types. For example, if you have a `Vec<u32>` and a `Vec<u64>`, the implementation of `Vec` needs to multiply by the size of the element type (4 bytes vs 8 bytes) to figure out the offset of each element. I don't think regular dynamic dispatch provides size information, so the compiler might need to "invent" a new flavor of dynamic dispatch internally if it wanted to do this?
With dynamic dispatch, they'd all be fat pointer sized. It would move to Vec<Box<dyn SomeTrait>> where SomeTrait is auto-generated and implemented for u32 and u64.
Ah I was thinking about "keep things allocated where they were before, but try to dynamically dispatch implementation code, to avoid compiling it N different times." It sounds like you're talking about "heap allocate everything by default." If everything was implicitly boxed, no type could be `Copy`, right? Is that strategy possible without breaking existing code?
Ah, you might be able to get away with a &dyn Trait too, not sure.

You can box Copy types.

You can Box Copy types, yes, but the result is !Copy. So for example, maybe I have a slice of ints, and I'm calling ptr::copy_nonoverlapping on that slice. (Maybe via safe code, like slice::copy_from_slice.) That no longer works; I need new heap allocations for each int. And my slices of ints no longer have a memory layout that's compatible with C, so all ffi breaks?
Ah yeah.

I do think you're correct here, which is that this idea sounds simple but has a lot of edge cases.

Is it a goal of production vs debug build that the data structures that you define in your program are always the same?

I could see some kinds of funny differences in behavior between debug / production builds, if for a debug build a particular thing is boxed, but in production it isn't.

If it's a program-visible kind of box, yes that would be a problem.

But if you're generating polymorphic dispatch code at a lower level than expressible in Rust itself, there is no reason why hidden box types cannot be used to simulate unboxed types. That's just a memory representation difference. A bit like the way V8 creates specialised types for JavaScript objects at run time, but it's invisible to JavaScript except for speed.

Yes, I do also think this is a big risk with this strategy.
It's a tradeoff. Do you want your code to run as if it was written by hand (i.e. duplicated by hand)? Then monomorphization is best.

For Rust, it makes a lot of sense to pay for some compile speed penalty to not have any performance penalty.

> during development builds

That's the key phrase in his statement.

I'm not familiar with Rust, but if compilation speed is a major issue, and there are aspects of the compilation that are avoidable to trade-off for runtime performance, it seems to be a good idea to make those configurable per-build-type. Does Rust not offer this?

I've written an image processing tool in Rust, where running a large image downscaling test with a release build takes 2.0 seconds, and with a debug build it takes 18.7 seconds. So most of the time during development I compiled in release mode, because it was actually faster overall.

For many applications debug builds are fast enough, but their runtime performance is already so slow, that I hope they don't get even slower.

It could certainly be a config flag on the profile that can be optionally enabled, and wouldn't need to be enabled by default.
Not currently, but it's been discussed a few times. Nobody has tried to write a patch yet, that I'm aware of.
> Do you want your code to run as if it was written by hand (i.e. duplicated by hand)? Then monomorphization is best.

That's not always true because monomorphization where the vast majority of the object code is the same in all actual cases means bloating the text of the program, which means putting pressure on the Icache, which means more cache misses, which... is slow.

Somebody wrote a macro for the simple cases: https://llogiq.github.io/2019/05/18/momo.html