Hacker News new | ask | show | jobs
by pornel 864 days ago
Passing arguments in registers is not an optimization, but an ABI. It always happens up to a certain number of arguments, and Rust in particular uses an ABI that flattens more structs into registers than C++.

Other moves could be memcpy, but there's a distinction between Rust saying moves behave like memcpy, and moves actually being memcpys. String's 3×size_t (or 2×size_t for Box<str>/Arc<str>) is below LLVM's threshold of actual memcpy call. Rust has optimization passes for eliminating redundant moves.

You're giving an impression that memcpy happens all over the place, where in reality it's quite rare, and certainly doesn't happen in the simple cases you describe.

In Rust, knowledge of ownership and the zoo of strings is a requirement (e.g. use of &String is a novice error). It's nice that Mojo can hide it, and you could celebrate that without making dubious performance claims.

> True if you want it to be immutable, but this actually adds to my point.

Sigh, it adds to inaccuracies. Mutable strings are &mut String, passed as a single pointer, so a mutable string is an even better case of a thin reference that doesn't need memcpy.

> those values need to be available to the end of scope to satisfy Rust's guarantees

No they don't. You're conflating Rust's guaranteed Drop order (which does interfere with TCO) with borrow checking and stack usage, which don't. For references and Copy types, Rust has an "eager drop" behavior. Their existence on the stack is not guaranteed nor necessary.

Borrow checking scopes are hypothetical for the sake of the check, and don't influence code generation in any way. You can literally remove borrow checker and lifetimes from Rust entirely, and the code will compile to the same instructions — mrustc implementation is a real example of that.

Your example function where you try to demonstrate how the arguments prevent TCO compiles to a single `ret`.

> it should be instant and faster than Mojo

The `factorial()` is instant, but the `black_box` isn't because Rust/Criterion implements it differently than Mojo. So Mojo has a faster `benchmark.keep` function, and you failed to benchmark the relevant function, and presented a misleading benchmark with a wrong conclusion.

You should validate your claims on what Rust does by actually checking the output. Try https://rust.godbolt.org/ (don't forget to add -O to flags!) or using cargo-show-asm.

2 comments

Hi thanks for the discussion, I tried out a bunch of different benchmarks, and Rust TCO is actually working as you say it does. I removed that part from the blog. Thanks very much for the discussion, I definitely need to upskill on assembly.
Here's at least three Rust veterans in this thread explaining that move is just a memcpy which can be optimized away: https://users.rust-lang.org/t/move-semantics-rust-vs-c/61274...

I removed criterion::black_black from Rust and it had no performance difference, so updated the blog. If I removed benchmark.keep from Mojo it ran in less than a picosecond, so I left it in to be fair.

Can you show me a benchmark to dispute my claim about recursion? I'm not sure what the generated assembly of a function that's not being run is meant to prove.