Hacker News new | ask | show | jobs
by Manishearth 3662 days ago
What's nightly in Rust today will become stable soon enough (idk the timeline for SIMD). But OK. Stable only. In that case, rust is 2x slower than c code that can be optimized by SIMD. Not much of an issue, really, and it proves nothing about rusts overhead except that you can't rely on autovectorization. Not really a big deal.

> 19% is a huge difference.

IMO that in the realm of microbenchmarks, it really isn't. You clearly disagree, not much I can do about that.

> That C code isn't well-optimized at all...

Go ahead and fix it then. You've been telling me much the same. I already mentioned that the other benchmark you linked me to wasn't optimized.

> Does that mean it is impossible to prove to you that C is at least 2x faster than Rust since twice is less than one order of magnitude

I use the term loosely, 2x is certainly alarming. As long as you rely on simd benchmarks I will disagree though, since in most cases a lack of that optimization isn't the reason your program is slow. If you really really care about performance, use nightly rust; there's no cost to that. I have yet to see production C code that uses SIMD everywhere possible, just in some tight loops. That is not going to create a 2x difference in performance unless the tight loop dominates all else. That is not most use cases.

1 comments

> Not much of an issue, really, and it proves nothing about rusts overhead except that you can't rely on autovectorization.

Not much of an issue unless you actually need the performace ofcourse. Ime, simd intrinsics is everywhere in code optimized to run as quickly as possible on x86. That about half of The Benchmark Game's benchmarks uses sse proves that point.

> Go ahead and fix it then. You've been telling me much the same. I already mentioned that the other benchmark you linked me to wasn't optimized.

That requires investing a lot of time in understanding how Ruby's internals and especially its string objects works. I don't have that time. The LPathBench on the other hand is self-contained and updating it shouldn't be more than a few hours of work for a decent Rust programmer.

> Not much of an issue unless you actually need the performace ofcourse. Ime, simd intrinsics is everywhere in code optimized to run as quickly as possible on x86. That about half of The Benchmark Game's benchmarks uses sse proves that point.

My point is that the Benchmark Game is not representative of real world code. The website says as much. Because the benchmarks use sse everywhere does not mean that most code, even perf-sensitive code will use simd everywhere.

Again, if you need simd, use a nightly. There's little to no drawback there.

I fixed it up to run on modern rust (https://gist.github.com/Manishearth/5fc73c405641162f0712951c..., compile with cargo build --release), and the numbers I get are:

(Ranges are just what I got from 5 runs, nothing scientific)

Rust: 610-630

c: 706-716

c_fast: 919?

cpp_clang: 669-694

cpp_plain: 717-728

I'm on a new (i7, 16gb) Mac so I don't yet have g++ around (nor do I know how to obtain it without messing things up; I'm used to linux), everything here done with clang.

Of course, this isn't an indication that Rust is faster than C. But it is an indication that it can be just as fast, and a reinforcement of my point about microbenchmarks having large error bars.

Edit:

On my older x86 linux laptop (with gcc):

Rust: 844-987

c_fast: 808-860 (perhaps clang somehow made c_fast slower than c on the mac? shrug)

c: 982-1025

cpp_plain: 977-1019

cpp_gcc: 925-947

I think I've proven my point.

> My point is that the Benchmark Game is not representative of real world code. The website says as much. Because the benchmarks use sse everywhere does not mean that most code, even perf-sensitive code will use simd everywhere.

Your point is incorrect. simd is everywhere in performance sensitive code, like in memcpy, memset, strlen, strcmp, image&video decoding...

> I fixed it up to run on modern rust (https://gist.github.com/Manishearth/5fc73c405641162f0712951c..., compile with cargo build --release), and the numbers I get are:

Note that the C benchmarks are all compiled with `-g -O2`. I'm not the author of that benchmark suite and it appears whoever is has abandoned the project.

If I fix the compiler switches (-O3 obviously) and recompile, the numbers I get are:

    Rust: 705
    C_fast: 630
I'm using Rust Nightly because I can't be bothered to install more than one Rust compiler.

That the numbers you are getting aren't stable suggests that you are using shoddy benchmarking techniques. Try and run them with as few applications open as possible.

Here are my updates to the c_fast benchmark:

https://gist.github.com/bjourne/4599a387d24c80906475b26b8ac9...

With this c_fast's number is 532. That is a fair bit faster than Rust and I'm sure someone who has more time than me and is more skilled at optimizing C code can improve it further.

I'm compiling with: `clang -O3 -march=native -mtune=native -fomit-frame-pointer c_fast.c -o c_fast` and my cpu is an "AMD Phenom(tm) II X6 1090T Processor"

That comparison is misleading for exactly the reasons others have said: the algorithms differ, as can be easily seen in their very different data structures.

A naive, line-by-line port of your fast variant to safe Rust (which I unfortunately am not allowed to share, but didn't require much thinking nor much time), without bothering with prefetching, gives me numbers more like:

  Rust-fast: 533
  C-fast: 685
I'm using --release for Rust (so no CPU-specific optimisation), and the same invocation as you for C. Everything except my editor is closed when benchmarking, and I'm on a Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz.
You seriously really can't cite benchmark results when you don't show the source.
I'm really really sorry (I want to keep my job), but seriously, the code I benchmarked was a trivial reimplementation of your code. The get_max_cost_small2 function that is benchmarked is so small and simple that someone else doing it is likely to end up with something identical!

I'm not trying to act in bad faith: as a member of the Rust core team, that would be braindead and stupid on my part.