Hacker News new | ask | show | jobs
by bjourne 3665 days ago
Performance. Rust is still twice as slow as C (http://benchmarksgame.alioth.debian.org/u64q/performance.php...) which is still a fair bit slower than if a skilled assembly programmer had taken on the task.

Rust aficionados will say that their compiler is getting better, but so is C. clang has gotten faster than gcc on some benchmarks and on some others gcc has catched up and is now faster than clang again.

But what if you don't need optimal performance? Then you can use Rust. But then you can also use Go, Python, SBCL, Haskell, Java, C#...

9 comments

This difference on this test is caused by Rust not having stabilized SIMD support. Also Rust support hand rolled assembly (on nightly) that C has.

On non-SIMD tasks Rust/C are neck and neck https://benchmarksgame.alioth.debian.org/u64q/rust.html

You're just cherry picking benchmarks. In the cases you care about raw number crunching power you'll likely be using a GPU not SIMD instructions as CPU's are roughly 3-4 orders of magnitude slower then GPU's at pure number crunching tasks.

Not that SIMD isn't important as it's instructions also cover things like AES, SHA1/2, Random numbers, Cache pre-loading/evacuation, memory fences, and fast loading paths. But so few programmers worry about these things you are really hitting a niche market.

GPUs aren't panacea. Such generalizations are wrong and will have you rearchitecture approaches after GPU io bottlenecks.
>GPUs aren't panacea. Such generalizations are wrong

This is true. But if you are doing hard number crunching you are using a GPU once you exhaust what a CPU can do. And most the time before you even touch SIMD as it's only a 4-8x speed up, while a GPU is 1000-10,000x.

>will have you rearchitecture approaches after GPU io bottlenecks.

90% of these are caused by bad software. Either using legacy API's. Or by writing code that forces GPU's to talk to processor more often then is necessary.

PCIe bandwidth is ~7.88GB/s [1] on modern Intel Chips (post 3xxx series). Compute GPU's often offer >10GB of on-board RAM. With last generations flag ships over a staggering 32GB of on-board RAM.

And if you are hitting a wall with GPU compute/IO limits, CPU SIMD instructions aren't going to help you in the slightest.

[1] http://www.tested.com/tech/457440-theoretical-vs-actual-band...

It's not the bandwidth that's the problem, it's the latency. Many problems need more control flow and branching, and that is better done on the CPU. If you need to make decisions and take the previous iterations output as an input, then the overhead of transferring the data back and forth from cpu to gpu outweighs the benefit of the gpu speed.

GPUs are good if you have an independent data parallel algorithm you are using to transform large blocks of floating point data. For other uses, CPU is better.

> This difference on this test is caused by Rust not having stabilized SIMD support. Also Rust support hand rolled assembly (on nightly) that C has.

Cool. Let's call that language with SIMD and inline assembly support FutureRust(tm) to differentiate it from the currently released and available Rust. We can have a discussion about how fast FutureRust will be vs C, but this discussion is about Rust vs C. Or rather clang 3.6.2/gcc 5.2.1 vs Rust 1.9.0 since language performance is very implementation dependent.

> On non-SIMD tasks Rust/C are neck and neck https://benchmarksgame.alioth.debian.org/u64q/rust.html

In 5 of 10 benchmarks, C is twice as fast as Rust. In one of the benchmarks where it is neck and neck, like pidigits (https://benchmarksgame.alioth.debian.org/u64q/performance.ph...) it appears to be so because both the C and the Rust variant are wrapping libgmp. GMP is written in C.

>In 5 of 10 benchmarks, C is twice as fast as Rust

fannkuch-redux why? SIMD

fasta-redux why? SIMD

spectral-norm why? SIMD

reverse-complement why? SIMD

N-Body why? Oh you guessed it SIMD

Seriously read the source code. Remember on HN where a lot of people constantly say the benchmark game is really crappy. This is why. All 5 of these tests boil down to raw FLOPS. Which C/C++ having access to SIMD instructions wins at.

The fact that Rust/C performance difference works out to just the ability to emit vector instructions says a lot about everything else in Rust. The fact that Rust can dereference, pass variables on the stack, call functions, and make decisions as fast as C renders your core point completely moot.

You are just being incredibly pedantic for no reason. And your argument holds no water. Everything Rust does is identical to C except one barely used corner case. They use the exact same model for computation, they both live in the Cee-LangVM. Post compilation they are functionally identical (except Rust makes stack manipulation easier).

Does any of that make sense to you?

:.:.:

Also Rust/C both calling the GMP without a time difference is a good thing. The Rust->C FFI is literally non-existent in practice. Dipping into C code from Rust (and vice versa) has no penalty. The same can't be said for HUNDREDS of languages.

Rust is also slower in binarytrees, regexdna and fasta. SSE is not one "barely used corner case" because huge amounts of performance critical code takes advantage of it.

Edit: To explain why I don't believe you when you say that "Post compilation they are functionally identical [in performance]" is because if it were so, you would just transliterate the C solutions to the Rust equivalents and it would run as fast as C. Since that hasn't been done and is trivial to do, my conclusion is that it doesn't lead to the same performance.

Did you know Rust was quite a bit faster than C in regexdna merely a few months ago? It didn't get slower because of Rust. The algorithms employed are radically different. My hope is that the regex library has already regained performance, but until the benchmark game is updated (which is on us, not the benchmark game maintainer), I suppose we'll have to suffer the pedants!

Or perhaps, you might look at single threaded performance and wonder, maybe there is something more interesting going on than a naive surface analysis of C vs. Rust! :-) https://benchmarksgame.alioth.debian.org/u64/rust.php

And by the way, transliterating a regex library isn't trivial. I invite you to transliterate Tcl's regex library. Let me know how that goes. ;-) So I think your reasoning is specious at best.

> It didn't get slower because of Rust.

Do you mean the program became relatively slower because of changes you've made to the regex crate?

Wasn't the program relatively faster because you wrote the regex crate to use Aho-Corasick for the matches required by the regex-dna task?

Please don't point people to u64 -- it's no longer updated. (Note the rustc version.)
Slower by a tiny amount, and still faster than other C implementations. It's within the error box.

Also iirc there are improvements to those benchmarks in the pipeline, idk what happened to them (Veedrac and llogiq had something in mind).

Sure, you could hand-translate C in many cases (not regex), but that would be far from idiomatic. Most of the rust solutions try to still look Rust-y.

Regarding sse, if you care about performance and sse use a nightly compiler. That option exists. Rust nightly is still Rust.

You can also just bundle Rust w/ LLVM and have it JIT compile your application on start up which'll yield huge performance gains too.

But people may get salty about binary image size.

>> Remember on HN where a lot of people constantly say the benchmark game is really crappy. This is why.

Because the benchmarks game shows some programs to be faster, and you agree those programs actually would be faster? :-)

>> All 5 of these tests boil down to raw FLOPS.

Where exactly are the floating-point operations in fannkuch-redux Rust #2 program ?

Where exactly are the floating-point operations in reverse-complement ?

("Seriously read the source code" ?)

>> fannkuch-redux why? SIMD

Look how many other programs, written in various languages, are shown ahead of the fannkuch-redux Rust #2 program.

Maybe you can write a better Rust fannkuch-redux program (even without SIMD).

Nightly Rust is also "currently released and available," and a significant part of the ecosystem can take advantage of it.

Besides, the vast majority of the work to close the gap between C and Rust in the benchmark game was from people optimizing the benchmarked programs, not from any language or compiler changes. There is no inherent 2x slowdown in any meaningful sense.

The inherent 2x slow down just works out to `__m128` vs `f64`. C can double Rust's FLOP thought-put.

The fact function calls, if statements, passing variables to functions is identical speed to C is lost on the parent poster. These core points prove the Rust/C are equal speed.

Or it is faster than C (http://benchmarksgame.alioth.debian.org/u64q/performance.php...). Depends which link you click on.
From the looks of it, that Rust program spawns 20 threads and does the computations in parallel. The C program does it all in one thread and doesn't even utilize sse intrinsics. I know full well that The Computer Language Benchmarks Game isn't a perfect source for programming language speed arguments, but what you can you do.
So it's okay to claim Rust is slower than C by cherry picking SIMD benchmarks (rust can do simd btw, just not on stable), but not okay to claim c is slower than rust by cherry picking a parallelization benchmark? If you don't think that benchmark is fair, submit a parallel solution in C.

The benchmarks game is far from being even a useful source. It gives order of magnitude answers, and that's pretty much it. Using it (cherry-picked!) to back up a claim that Rust is 2x slower than C is disingenuous. "but what can you do" -- don't make absolute arguments about something using imprecise data.

Actual rust in real world programs may actually end up being faster than C (see Yehuda Katz's talk on fast_blank in rust). C often needs to be hand-optimized. Rust, with its zero cost abstractions, often doesn't need to be; a naive program in rust would probably be faster than the same in C.

Remember that fundamentally rust compiles the same way c does, and your rust code shouldn't have any more overhead. (except drop flags -- a minor cost -- which are something you might hand-implement in c anyway). We also use llvm, so we get mostly the same compiler optimizations.

I wrote "what can you do" because you are supposed to use some modicum of common sense when reading the numbers on The Benchmark Game. E.g in one of the benchmarks PHP beats both C and Rust, so you need to apply common sense to understand that that result is an outlier.

I didn't cherry-pick; in 5/10 benchmarks, C is twice as fast as Rust.

> rust code shouldn't have any more overhead.

But it appears that it have.

> We also use llvm, so we get mostly the same compiler optimizations.

That is not a guarantee for efficient code. For example, in my testing, g++ is over 50% faster than clang++ in certain template-heavy scenarios.

>I didn't cherry-pick; in 5/10 benchmarks, C is twice as fast as Rust.

Making the claim that C is twice as fast as Rust because of 5/10 benchmarks in the "benchmark game" shows an incredible lack of common sense to me.

In 5/10 benchmarks, the benchmark games claims Go has equal if not better performance than Rust. Am I supposed to believe now, that a managed, garbage collected, 6-year-old compiler, language is as fast as as language without a runtime running on LLVM?

Don't back up your claim with flawed benchmarks.

>> Don't back up your claim with flawed benchmarks.

:-)

" How fast is Rust? Fast! Rust is already competitive with idiomatic C and C++ in a number of benchmarks (like the Benchmarks Game and others)."

https://www.rust-lang.org/faq.html#performance

>> In 5/10 benchmarks, the benchmark games claims Go has equal if not better performance than Rust. Am I supposed to believe…

Believe that those Rust programs gave those measurements, and those Go programs gave those measurements (when compiled and measured as described on the website in tedious detail).

It does matter how the programs are written!

Write better Rust implementations for those tasks and contribute them --

http://benchmarksgame.alioth.debian.org/play.html

> Making the claim that C is twice as fast as Rust because of 5/10 benchmarks in the "benchmark game" shows an incredible lack of common sense to me.

Actually, 2x is likely the lower bound of how much faster well-written C is over Rust. Rust developers have an interest in promoting their language so they will make sure their test programs runs as fast as possible. C doesn't need that kind of marketing.

For example, the Rust solutions were all updated in 2015 while the C solutions hasn't been touched since 2013.

> In 5/10 benchmarks, the benchmark games claims Go has equal if not better performance than Rust. Am I supposed to believe now, that a managed, garbage collected, 6-year-old compiler, language is as fast as as language without a runtime running on LLVM?

It doesn't run on LLVM. It takes advantage of LLVM to compile ELF executables. I said that common sense should be used.

No, it does not appear that Rust has any overhead over C, except in cases that use SIMD. That is not a generally applicable result.
> some modicum of common sense when reading the numbers on The Benchmark Game

yes, this involves checking what the benchmarks are actually measuring. In this case, it is how much faster SIMD makes things. Factor that in, or rewrite the programs with SIMD in rust, and it should come out to be the same.

> But it appears that it have.

Have you not been listening? It doesn't. The speed differences you quote are due to simd. Rust has simd support, just not in a non-nightly compiler.

> Have you not been listening? It doesn't. The speed differences you quote are due to simd.

That should be easy to demonstrate!

Please quote the lines in the source-code of these fannkuch-redux and reverse-complement programs that show SIMD use --

http://benchmarksgame.alioth.debian.org/u64q/program.php?tes...

http://benchmarksgame.alioth.debian.org/u64q/program.php?tes...

Using the nightly builds of any programming language in production is insane. That's why we call it FutureRust to differentiate it from what is production ready Rust.

That Rust might have SIMD intrinsics in the future matters little to people trying to seriously use Rust today. And in several benchmarks C handily beats Rust even without intrinsics. Such as the fannkuch-redux one where it is about 2x faster.

Yeah, given that the CPU load for the C lines are listed as

    100% 95% 95% 95%
on a quadcore, I'm going to say "not one thread".

Edit: to be helpful, rather than just obnoxious, (<3) the C version uses open mp pragmas, so it looks single-threaded.

You're right - for microbenchmarks, the difference between Rust and C is going to be in how the solution is implemented, because the languages have such similar performance characteristics. This is why your original comment that Rust is not as performant as C is quite silly. You look rather hypocritical turning around and pointing it out when someone links to a microbenchmark on which Rust outperforms C.
Both examples are cherry-picked, is the point. Neither is particularly helpful in representing the whole story.

That said, Rust makes it ridiculously easy to use parallelism if a task can support it, which is a strong advantage in its favor.

> but what you can you do.

Not make sweeping generalizations based on one benchmark you didn't write?

Well, if you are familiar enough with the languages in question, you can look for what appear to be submissions that appear to use mostly idiomatic expressions of the language. Alternatively, if you are more interested in maximum possible performance, you can confirm both are making use of advanced techniques for quicker execution, but that's probably much harder to judge.
I'm not sure thats the best source you can use. C is listed 5 times on there, the slowest taking 15 seconds. Likewise C #3 times similarly to Rust. So is the case that C is actually faster, or did the guy who wrote C #5 optimize the code and neglected to give the same optimizations to Rust.
Rust can, somewhat is already and ultimately will beat C on performance, due to way better guarantees on pointer non-aliasing. See eg. http://stackoverflow.com/questions/146159/is-fortran-faster-...

Other than that Rust has way better zero-cost abstractions, so in practice allows writing faster code, as in C there are sanity limits after which you give up and write slower, but easier to manage code, as macro processor sucks, and type system is trying to stab you in the back at every step.

Good example is `qsort`: http://www.tutorialspoint.com/c_standard_library/c_function_...

What are we looking for in the qsort example? Is there a Rust equivalent we should compare to?
The C code is forced to use a function pointer and hence do dynamic virtual calls for each comparison, while Rust and C++ can use generics/templates to get static dispatch (and hence inlining, constant folding etc.). You can see C vs. C++ in http://www.martin-ueding.de/en/programming/qsort/index.html , and Rust is likely to be similar to C++.
LTO will allow qsort to be inlined, etc.
Most of the slower benchmarks were fixed and just need to be merged iirc.

I don't have much faith in microbenchmarks. Usually all they measure is how much effort the author put into overoptimizing code.

There's a pretty big chasm between "only 2x as slow as C" and "not needing optimal performance". That's even assuming it is generally true and a constant 2x factor.

How many skilled assembly programmers you know that are able to write better code than the collective intelligence embedded in current compilers? Even if you have a few of them handy, then aren't their resources better spent in hand-optimizing the compiler output for critical sections only?

Isn't Rust's biggest advantage is safety , I make websites with golang , I can't stand the nil-ness and all the mutation mess I easily get myself into, could use a safer type system.

Sometimes you just need safety and correctness .

Since people were so angry I used The Benchmark Game as a source for benchmarks here (https://github.com/logicchains/LPATHBench/blob/master/writeu...) is another micro benchmark showing gcc & clang handily beating rustc. Though the timings are 1+1/2 year old. Their relative performances might have changed significantly.
That's pre-1.0. Rust has changed a lot.

Also, given how prone microbenchmarks are to depending on hand-optimization over the compiler quality, benchmarks should have been contributed to by the community -- I don't think anyone in rust has heard about this one.

Oh also, Rust is as fast as C/C++ there. It's just not faster than C++Cached, _which is a different algorithm_. That's the problem with microbenchmarks, you end up measuring differences in the algorithm used.

No. In the first table Rust has 1874 and C++/clang 1722. The latter number is lower. C++ with clang beats Rust.

In the second table all C and C++ versions beats Rust. 618, 749, 755 and 735 vs 877. That is a very big difference.

You can also run the fucking benchmarks yourself and see for yourself. I have linked to lots of benchmarks showing C spanking Rust. None has shown any fair benchmarks were Rust is as fast as C.

That's a ... very small difference. And again, probably due to implementation differences. I'm not claiming C doesn't beat Rust, I'm just saying by very little -- Rust is practically just as fast, within the margin of error that microbenchmarks have. You have been belting out claims that Rust is 2x slower -- clearly false. Rust may be 5% slower -- which ... doesn't really matter.

Look at wycats' talk on fast_blank. That's a real world example that's faster than C. Rust used to be faster than c on the regex benchmark at one point, as burntsushi pointed out.

> That's a ... very small difference. And again, probably due to implementation differences.

735 / 618 = 1.19 So Rust is at least 19% slower than C even without involving SIMD intrinsics. You wrote "your rust code shouldn't have any more overhead" but in all the benchmarks it has!

> You have been belting out claims that Rust is 2x slower -- clearly false.

Clearly not, since it is on all the SIMD-using benchmarks.

> Look at wycats' talk on fast_blank. That's a real world example that's faster than C. Rust used to be faster than c on the regex benchmark at one point, as burntsushi pointed out.

Because it is comparing different regex engines, not language performance. I said you should apply "common sense" to The Benchmark Game's numbers.

Here's the thing, you guys can easily prove me wrong. Prove that Rust has zero cost abstractions by taking any small C benchmark, transliterate it to Rust code and profile it. If it is as fast, I'm proven wrong. If it is slower you are proven wrong.

Rust can use simd too. It doesn't in those benchmarks. Please apply the common sense you keep harping about. Claiming rust is 2x slower because of that benchmark is a falsehood.

Re:regex: my point exactly. Most microbenchmarks are prone to slight differences in the implementation causing issues (and you can rarely translate code exactly, especially to something like rust which often requires a different structure of code from C. Same for any two other langauges). 19% is well within this error box.

The fast_blank thing is this example. fast_blank is a carefully hand optimized C extension whose main purpose is being super fast. A mostly naive Rust one-liner beat it (not by much iirc, perhaps 10%, but thats within the error box im talking about). It didn't use parallelism or anything fancy. They weren't even trying to beat C. I provided this proof already.

I could try fixing that benchmark you linked to -- the rust version looks like it could be optimized further. Not sure if its worth it, really. I don't put much stock in microbenchmarks for anything other than order of magnitude comparisons.

> Because it is comparing different regex engines, not language performance. I said you should apply "common sense" to The Benchmark Game's numbers.

Then don't also use regex-dna as evidence that Rust is "slow":

> Rust is also slower in binarytrees, regexdna and fasta.

You can't have it both ways.

> Rust is still twice as slow as C which is still a fair bit slower than if a skilled assembly programmer had taken on the task.

Really? Are there really people who write (a lot of) assembly in order to get code "a fair bit" faster than C? What on earth are they working on?

VM implementations and garbage collectors.