| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by valarauca1 3662 days ago

>In 5 of 10 benchmarks, C is twice as fast as Rust

fannkuch-redux why? SIMD

fasta-redux why? SIMD

spectral-norm why? SIMD

reverse-complement why? SIMD

N-Body why? Oh you guessed it SIMD

Seriously read the source code. Remember on HN where a lot of people constantly say the benchmark game is really crappy. This is why. All 5 of these tests boil down to raw FLOPS. Which C/C++ having access to SIMD instructions wins at.

The fact that Rust/C performance difference works out to just the ability to emit vector instructions says a lot about everything else in Rust. The fact that Rust can dereference, pass variables on the stack, call functions, and make decisions as fast as C renders your core point completely moot.

You are just being incredibly pedantic for no reason. And your argument holds no water. Everything Rust does is identical to C except one barely used corner case. They use the exact same model for computation, they both live in the Cee-LangVM. Post compilation they are functionally identical (except Rust makes stack manipulation easier).

Does any of that make sense to you?

:.:.:

Also Rust/C both calling the GMP without a time difference is a good thing. The Rust->C FFI is literally non-existent in practice. Dipping into C code from Rust (and vice versa) has no penalty. The same can't be said for HUNDREDS of languages.

3 comments

bjourne 3662 days ago

Rust is also slower in binarytrees, regexdna and fasta. SSE is not one "barely used corner case" because huge amounts of performance critical code takes advantage of it.

Edit: To explain why I don't believe you when you say that "Post compilation they are functionally identical [in performance]" is because if it were so, you would just transliterate the C solutions to the Rust equivalents and it would run as fast as C. Since that hasn't been done and is trivial to do, my conclusion is that it doesn't lead to the same performance.

link

burntsushi 3662 days ago

Did you know Rust was quite a bit faster than C in regexdna merely a few months ago? It didn't get slower because of Rust. The algorithms employed are radically different. My hope is that the regex library has already regained performance, but until the benchmark game is updated (which is on us, not the benchmark game maintainer), I suppose we'll have to suffer the pedants!

Or perhaps, you might look at single threaded performance and wonder, maybe there is something more interesting going on than a naive surface analysis of C vs. Rust! :-) https://benchmarksgame.alioth.debian.org/u64/rust.php

And by the way, transliterating a regex library isn't trivial. I invite you to transliterate Tcl's regex library. Let me know how that goes. ;-) So I think your reasoning is specious at best.

link

igouy 3661 days ago

> It didn't get slower because of Rust.

Do you mean the program became relatively slower because of changes you've made to the regex crate?

Wasn't the program relatively faster because you wrote the regex crate to use Aho-Corasick for the matches required by the regex-dna task?

link

burntsushi 3661 days ago

> Do you mean the program became relatively slower because of changes you've made to the regex crate?

Yes. The underlying reasoning is complex. When the regex crate got a lazy DFA (similar to the one used by RE2), the vast majority of regexes got significantly faster. Some got slower. This one in particular from regex-dna:

    >[^\n]*\n|\n

Before the lazy DFA, compile time analysis would notice that all matches either start with `>` or `\n` and do a fast prefix scan for them. Each match of `>` or `\n` represents a candidate for a match. Candidates were then verified using something similar to the Thompson NFA, which is generally pretty slow, but the prefix scanning reduced the amount of work required considerably.

Once the lazy DFA was added, the prefix scanning was still used, but the lazy DFA was used to verify candidates. It's faster in general by a lot, but, the lazy DFA requires two scans of the candidate: one to find the end position and another to find the start position. That extra scan made processing this regex (on the regex-dna input) slightly slower.

I've since fixed some of this by reducing a lot of the match overhead of the lazy DFA, so my hope is that it's back to par, but I haven't done any rigorous benchmarking to verify that.

> Wasn't the program relatively faster because you wrote the regex crate to use Aho-Corasick for the matches required by the regex-dna task?

Aho-Corasick is principally useful for the second phase of regex-dna, e.g., the regexes that look like `ag[act]gtaaa|tttac[agt]ct`. (In the last phase, all the regexes are just single byte literals, so neither Aho-Corasick nor the regex engine should ever be used.) Performance here should stay the same.

On that note, I have a new nightly-only algorithm called Teddy that uses SIMD[1] (which replaces the use of Aho-Corasick for those regexes) and is a bit faster. I got the algorithm from the Hyperscan[2] project, which also does extensive literal analysis to speed up regexes.

To clarify, this optimization is generally useful because a lot of regexes in the wild have prefix literals. Even something like `(?i:foo)\s+bar` can benefit from it, since `(?i:foo)` expands to FOO, FOo, FoO, Foo, fOO, fOo, foO, foo, which can of course be used with Aho-Corasick (and also my new SIMD algorithm).

One also must wonder how well a C program using PCRE2's JIT would fair on the benchmarks game. From my experience, it would probably be near the top. It's quite fast!

[1] - https://github.com/rust-lang-nursery/regex/blob/master/src/s...

[2] - https://github.com/01org/hyperscan

link

igouy 3661 days ago

> One also must wonder how well a C program using PCRE2's JIT would fair on the benchmarks game.

Let's hope some C and C++ programmers take up the challenge ;-)

link

igouy 3662 days ago

Please don't point people to u64 -- it's no longer updated. (Note the rustc version.)

link

Manishearth 3662 days ago

Slower by a tiny amount, and still faster than other C implementations. It's within the error box.

Also iirc there are improvements to those benchmarks in the pipeline, idk what happened to them (Veedrac and llogiq had something in mind).

Sure, you could hand-translate C in many cases (not regex), but that would be far from idiomatic. Most of the rust solutions try to still look Rust-y.

Regarding sse, if you care about performance and sse use a nightly compiler. That option exists. Rust nightly is still Rust.

link

valarauca1 3659 days ago

You can also just bundle Rust w/ LLVM and have it JIT compile your application on start up which'll yield huge performance gains too.

But people may get salty about binary image size.

link

igouy 3662 days ago

>> Remember on HN where a lot of people constantly say the benchmark game is really crappy. This is why.

Because the benchmarks game shows some programs to be faster, and you agree those programs actually would be faster? :-)

>> All 5 of these tests boil down to raw FLOPS.

Where exactly are the floating-point operations in fannkuch-redux Rust #2 program ?

Where exactly are the floating-point operations in reverse-complement ?

("Seriously read the source code" ?)

link

igouy 3662 days ago

>> fannkuch-redux why? SIMD

Look how many other programs, written in various languages, are shown ahead of the fannkuch-redux Rust #2 program.

Maybe you can write a better Rust fannkuch-redux program (even without SIMD).

link