Hacker News new | ask | show | jobs
by alexcrichton 3606 days ago
In Rust it's frequently the case that slow compilations are dominated by generating and optimizing LLVM IR. This codegen step (generating LLVM IR) often takes awhile just because we're generating so much IR.

Rust takes an approach with generic functions called monomorphization which means that we generate a new version of each function for each set of generics it's instantiated with. This means that a future of a String will generate entirely different code from a future of an integer. This allows generics to be a zero cost abstraction because code is optimized as if you had substituted all the generics by hand.

Putting all that together, highly generic programs will generally trend towards higher compile times. With all the generics in play, there tends to be a lot of monomorphization which causes quite a lot of LLVM IR to get generated.

As with many aspects of Rust, however, you have a choice! Rust supports what we call "trait objects" which is a way to take a future and put it behind an allocation with a vtable (virtual dispatch). This forces the compiler to generate code immediately when a trait object is created, rather than down the line when something is monomorphized.

Put another way, you've got control over compile times if you're using futures. If you're taking a future generically and that takes too long to compile, you can instead take a trait object (or quickly convert it to a trait object). This will help cut down on the amount of code getting monomorphized.

So in general futures shouldn't make compilation worse. You'll have a choice between performance (no boxes) and compile times (boxing) occasionally, but that's basically already the case of what happens in Rust today.

2 comments

Do note that with MIR the focus is polymorphic optimizations - reducing the LLVM IR for all monomorphizations of a generic function, at once.

Unlike C++ templates, Rust enforces a single definition with uniform semantics, for a generic type/function (specialization going through the existing trait static dispatch mechanism), so we can take advantage of that to reduce compile times.

Hm. I've heard arguments that C# or Java is slow for multiple reasons, but never because of the minuscule overhead of a virtual method dispatch when using objects behind interfaces (kinds similar to trait objects).

It's interesting that this is seen as significant here. Are we dealing with much shorter timescales, or just being eager to optimise everything?

Virtual dispatch per se is not terribly slow, as long as the branch is predictable by the CPU. The problem is that virtual dispatch prevents the sort of aggressive inlining and interprocedural opimizatios that C++ compilers are known to do. C# and Java JITers get around that via runtime analysis and speculative inlining, but that is done at runtime and eats away some of the precious little time available for optimisations.

Edit: spelling

Put it this way:

Cost of a branch misprediction is 10s of cpu cycles. (1) Measured in gigahertz (10^9 cycles per second).

Time to turn around a web request is, if you're very lucky and have done the work, mainly about getting a value from an in-memory cache at multiple milliseconds (2). That's 1 / (10^3) seconds.

If you're not lucky, 10s or 100s of milliseconds to generate the response.

It seems that the second duration is best case around 10^6 times longer. I would not sweat the first one.

1) http://stackoverflow.com/a/289860/5599 2) http://synsem.com/MCD_Redis_EMS/

Contrary to popular belief, not all C++ programs (or rust FWIW) are web servers serving HTTP requests over the Internet.
Yep, that's why I'm asking about the use-cases in the grandparent comment.
As an example, many real-time systems are often a giant ball of messy asynchronous code and state machines. Futures can help with that, although lately I have found that somtimes the best, cleanest, way to implement a state machine is to make it explicit.
I assume at least part of it is that "zero-cost abstractions" is a fairly objective and boolean metric to calculate. "Is this performance impact significant enough to worry about?" would probably result in a lot more bikeshedding.
A tremendous amount of effort has gone into the CLR towards optimizing interface dispatch, because at one time it was slow. Interface dispatches are cached at the call site to avoid real virtual (vtable) dispatch, just like a Smalltalk or JavaScript VM would.