Hacker News new | ask | show | jobs
by Manishearth 3663 days ago
Rust can use simd too. It doesn't in those benchmarks. Please apply the common sense you keep harping about. Claiming rust is 2x slower because of that benchmark is a falsehood.

Re:regex: my point exactly. Most microbenchmarks are prone to slight differences in the implementation causing issues (and you can rarely translate code exactly, especially to something like rust which often requires a different structure of code from C. Same for any two other langauges). 19% is well within this error box.

The fast_blank thing is this example. fast_blank is a carefully hand optimized C extension whose main purpose is being super fast. A mostly naive Rust one-liner beat it (not by much iirc, perhaps 10%, but thats within the error box im talking about). It didn't use parallelism or anything fancy. They weren't even trying to beat C. I provided this proof already.

I could try fixing that benchmark you linked to -- the rust version looks like it could be optimized further. Not sure if its worth it, really. I don't put much stock in microbenchmarks for anything other than order of magnitude comparisons.

1 comments

> Rust can use simd too.

No it can't. Either accept that the nightly build of Rust is not the Rust we are talking about or stop discussing with me.

> Re:regex: my point exactly.

The problem with regex libraries are that they are to big so therefore doesn't reveal so much about inherent language performance.

> Most microbenchmarks are prone to slight differences in the implementation causing issues (and you can rarely translate code exactly, especially to something like rust which often requires a different structure of code from C. Same for any two other langauges).

Yes, obviously the implementation defines performance. That's what I wrote in the other thread part: "this discussion is about Rust vs C. Or rather clang 3.6.2/gcc 5.2.1 vs Rust 1.9.0 since language performance is very implementation dependent"

And fwiw, you can easily transliterate C code to C++ or to asm.

> 19% is well within this error box.

What error box? 19% is a huge difference.

> The fast_blank thing is this example. fast_blank is a carefully hand optimized C extension whose main purpose is being super fast.

I don't know what fast_blank is. Is it this https://github.com/SamSaffron/fast_blank/blob/master/ext/fas... C code wycatz managed to rewrite faster in Rust? That C code isn't well-optimized at all...

> I don't put much stock in microbenchmarks for anything other than order of magnitude comparisons.

Does that mean it is impossible to prove to you that C is at least 2x faster than Rust since twice is less than one order of magnitude?

What's nightly in Rust today will become stable soon enough (idk the timeline for SIMD). But OK. Stable only. In that case, rust is 2x slower than c code that can be optimized by SIMD. Not much of an issue, really, and it proves nothing about rusts overhead except that you can't rely on autovectorization. Not really a big deal.

> 19% is a huge difference.

IMO that in the realm of microbenchmarks, it really isn't. You clearly disagree, not much I can do about that.

> That C code isn't well-optimized at all...

Go ahead and fix it then. You've been telling me much the same. I already mentioned that the other benchmark you linked me to wasn't optimized.

> Does that mean it is impossible to prove to you that C is at least 2x faster than Rust since twice is less than one order of magnitude

I use the term loosely, 2x is certainly alarming. As long as you rely on simd benchmarks I will disagree though, since in most cases a lack of that optimization isn't the reason your program is slow. If you really really care about performance, use nightly rust; there's no cost to that. I have yet to see production C code that uses SIMD everywhere possible, just in some tight loops. That is not going to create a 2x difference in performance unless the tight loop dominates all else. That is not most use cases.

> Not much of an issue, really, and it proves nothing about rusts overhead except that you can't rely on autovectorization.

Not much of an issue unless you actually need the performace ofcourse. Ime, simd intrinsics is everywhere in code optimized to run as quickly as possible on x86. That about half of The Benchmark Game's benchmarks uses sse proves that point.

> Go ahead and fix it then. You've been telling me much the same. I already mentioned that the other benchmark you linked me to wasn't optimized.

That requires investing a lot of time in understanding how Ruby's internals and especially its string objects works. I don't have that time. The LPathBench on the other hand is self-contained and updating it shouldn't be more than a few hours of work for a decent Rust programmer.

> Not much of an issue unless you actually need the performace ofcourse. Ime, simd intrinsics is everywhere in code optimized to run as quickly as possible on x86. That about half of The Benchmark Game's benchmarks uses sse proves that point.

My point is that the Benchmark Game is not representative of real world code. The website says as much. Because the benchmarks use sse everywhere does not mean that most code, even perf-sensitive code will use simd everywhere.

Again, if you need simd, use a nightly. There's little to no drawback there.

I fixed it up to run on modern rust (https://gist.github.com/Manishearth/5fc73c405641162f0712951c..., compile with cargo build --release), and the numbers I get are:

(Ranges are just what I got from 5 runs, nothing scientific)

Rust: 610-630

c: 706-716

c_fast: 919?

cpp_clang: 669-694

cpp_plain: 717-728

I'm on a new (i7, 16gb) Mac so I don't yet have g++ around (nor do I know how to obtain it without messing things up; I'm used to linux), everything here done with clang.

Of course, this isn't an indication that Rust is faster than C. But it is an indication that it can be just as fast, and a reinforcement of my point about microbenchmarks having large error bars.

Edit:

On my older x86 linux laptop (with gcc):

Rust: 844-987

c_fast: 808-860 (perhaps clang somehow made c_fast slower than c on the mac? shrug)

c: 982-1025

cpp_plain: 977-1019

cpp_gcc: 925-947

I think I've proven my point.

> My point is that the Benchmark Game is not representative of real world code. The website says as much. Because the benchmarks use sse everywhere does not mean that most code, even perf-sensitive code will use simd everywhere.

Your point is incorrect. simd is everywhere in performance sensitive code, like in memcpy, memset, strlen, strcmp, image&video decoding...

> I fixed it up to run on modern rust (https://gist.github.com/Manishearth/5fc73c405641162f0712951c..., compile with cargo build --release), and the numbers I get are:

Note that the C benchmarks are all compiled with `-g -O2`. I'm not the author of that benchmark suite and it appears whoever is has abandoned the project.

If I fix the compiler switches (-O3 obviously) and recompile, the numbers I get are:

    Rust: 705
    C_fast: 630
I'm using Rust Nightly because I can't be bothered to install more than one Rust compiler.

That the numbers you are getting aren't stable suggests that you are using shoddy benchmarking techniques. Try and run them with as few applications open as possible.

Here are my updates to the c_fast benchmark:

https://gist.github.com/bjourne/4599a387d24c80906475b26b8ac9...

With this c_fast's number is 532. That is a fair bit faster than Rust and I'm sure someone who has more time than me and is more skilled at optimizing C code can improve it further.

I'm compiling with: `clang -O3 -march=native -mtune=native -fomit-frame-pointer c_fast.c -o c_fast` and my cpu is an "AMD Phenom(tm) II X6 1090T Processor"

That comparison is misleading for exactly the reasons others have said: the algorithms differ, as can be easily seen in their very different data structures.

A naive, line-by-line port of your fast variant to safe Rust (which I unfortunately am not allowed to share, but didn't require much thinking nor much time), without bothering with prefetching, gives me numbers more like:

  Rust-fast: 533
  C-fast: 685
I'm using --release for Rust (so no CPU-specific optimisation), and the same invocation as you for C. Everything except my editor is closed when benchmarking, and I'm on a Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz.
Oh, and transliterating C and C++ is an exception to the norm. C and C++ are historically linked and quite similar in many ways. Rust does not have this relationship with C. You could easily transliterate C code to unsafe rust code, but that sort of misses the point, doesn't it? :)