Hacker News new | ask | show | jobs
by Manishearth 3661 days ago
> Not much of an issue unless you actually need the performace ofcourse. Ime, simd intrinsics is everywhere in code optimized to run as quickly as possible on x86. That about half of The Benchmark Game's benchmarks uses sse proves that point.

My point is that the Benchmark Game is not representative of real world code. The website says as much. Because the benchmarks use sse everywhere does not mean that most code, even perf-sensitive code will use simd everywhere.

Again, if you need simd, use a nightly. There's little to no drawback there.

I fixed it up to run on modern rust (https://gist.github.com/Manishearth/5fc73c405641162f0712951c..., compile with cargo build --release), and the numbers I get are:

(Ranges are just what I got from 5 runs, nothing scientific)

Rust: 610-630

c: 706-716

c_fast: 919?

cpp_clang: 669-694

cpp_plain: 717-728

I'm on a new (i7, 16gb) Mac so I don't yet have g++ around (nor do I know how to obtain it without messing things up; I'm used to linux), everything here done with clang.

Of course, this isn't an indication that Rust is faster than C. But it is an indication that it can be just as fast, and a reinforcement of my point about microbenchmarks having large error bars.

Edit:

On my older x86 linux laptop (with gcc):

Rust: 844-987

c_fast: 808-860 (perhaps clang somehow made c_fast slower than c on the mac? shrug)

c: 982-1025

cpp_plain: 977-1019

cpp_gcc: 925-947

I think I've proven my point.

1 comments

> My point is that the Benchmark Game is not representative of real world code. The website says as much. Because the benchmarks use sse everywhere does not mean that most code, even perf-sensitive code will use simd everywhere.

Your point is incorrect. simd is everywhere in performance sensitive code, like in memcpy, memset, strlen, strcmp, image&video decoding...

> I fixed it up to run on modern rust (https://gist.github.com/Manishearth/5fc73c405641162f0712951c..., compile with cargo build --release), and the numbers I get are:

Note that the C benchmarks are all compiled with `-g -O2`. I'm not the author of that benchmark suite and it appears whoever is has abandoned the project.

If I fix the compiler switches (-O3 obviously) and recompile, the numbers I get are:

    Rust: 705
    C_fast: 630
I'm using Rust Nightly because I can't be bothered to install more than one Rust compiler.

That the numbers you are getting aren't stable suggests that you are using shoddy benchmarking techniques. Try and run them with as few applications open as possible.

Here are my updates to the c_fast benchmark:

https://gist.github.com/bjourne/4599a387d24c80906475b26b8ac9...

With this c_fast's number is 532. That is a fair bit faster than Rust and I'm sure someone who has more time than me and is more skilled at optimizing C code can improve it further.

I'm compiling with: `clang -O3 -march=native -mtune=native -fomit-frame-pointer c_fast.c -o c_fast` and my cpu is an "AMD Phenom(tm) II X6 1090T Processor"

That comparison is misleading for exactly the reasons others have said: the algorithms differ, as can be easily seen in their very different data structures.

A naive, line-by-line port of your fast variant to safe Rust (which I unfortunately am not allowed to share, but didn't require much thinking nor much time), without bothering with prefetching, gives me numbers more like:

  Rust-fast: 533
  C-fast: 685
I'm using --release for Rust (so no CPU-specific optimisation), and the same invocation as you for C. Everything except my editor is closed when benchmarking, and I'm on a Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz.
You seriously really can't cite benchmark results when you don't show the source.
I'm really really sorry (I want to keep my job), but seriously, the code I benchmarked was a trivial reimplementation of your code. The get_max_cost_small2 function that is benchmarked is so small and simple that someone else doing it is likely to end up with something identical!

I'm not trying to act in bad faith: as a member of the Rust core team, that would be braindead and stupid on my part.

Feel free to use my email address (easily findable) and mail me the source. Otherwise, no deal.
I literally cannot share the source, I wish I could but the reality is my job does not let me. You're being unreasonable given how ridiculously simple the benchmarked section of the code is: it would not take long for even a Rust beginner to reimplement something equivalent, especially since it doesn't touch on any of the "hard" parts of Rust (no need for explicit lifetimes etc.).

As I said before, I have nothing to gain and everything to lose by lying to you.