Hacker News new | ask | show | jobs
by bjourne 3662 days ago
Rust is also slower in binarytrees, regexdna and fasta. SSE is not one "barely used corner case" because huge amounts of performance critical code takes advantage of it.

Edit: To explain why I don't believe you when you say that "Post compilation they are functionally identical [in performance]" is because if it were so, you would just transliterate the C solutions to the Rust equivalents and it would run as fast as C. Since that hasn't been done and is trivial to do, my conclusion is that it doesn't lead to the same performance.

2 comments

Did you know Rust was quite a bit faster than C in regexdna merely a few months ago? It didn't get slower because of Rust. The algorithms employed are radically different. My hope is that the regex library has already regained performance, but until the benchmark game is updated (which is on us, not the benchmark game maintainer), I suppose we'll have to suffer the pedants!

Or perhaps, you might look at single threaded performance and wonder, maybe there is something more interesting going on than a naive surface analysis of C vs. Rust! :-) https://benchmarksgame.alioth.debian.org/u64/rust.php

And by the way, transliterating a regex library isn't trivial. I invite you to transliterate Tcl's regex library. Let me know how that goes. ;-) So I think your reasoning is specious at best.

> It didn't get slower because of Rust.

Do you mean the program became relatively slower because of changes you've made to the regex crate?

Wasn't the program relatively faster because you wrote the regex crate to use Aho-Corasick for the matches required by the regex-dna task?

> Do you mean the program became relatively slower because of changes you've made to the regex crate?

Yes. The underlying reasoning is complex. When the regex crate got a lazy DFA (similar to the one used by RE2), the vast majority of regexes got significantly faster. Some got slower. This one in particular from regex-dna:

    >[^\n]*\n|\n
Before the lazy DFA, compile time analysis would notice that all matches either start with `>` or `\n` and do a fast prefix scan for them. Each match of `>` or `\n` represents a candidate for a match. Candidates were then verified using something similar to the Thompson NFA, which is generally pretty slow, but the prefix scanning reduced the amount of work required considerably.

Once the lazy DFA was added, the prefix scanning was still used, but the lazy DFA was used to verify candidates. It's faster in general by a lot, but, the lazy DFA requires two scans of the candidate: one to find the end position and another to find the start position. That extra scan made processing this regex (on the regex-dna input) slightly slower.

I've since fixed some of this by reducing a lot of the match overhead of the lazy DFA, so my hope is that it's back to par, but I haven't done any rigorous benchmarking to verify that.

> Wasn't the program relatively faster because you wrote the regex crate to use Aho-Corasick for the matches required by the regex-dna task?

Aho-Corasick is principally useful for the second phase of regex-dna, e.g., the regexes that look like `ag[act]gtaaa|tttac[agt]ct`. (In the last phase, all the regexes are just single byte literals, so neither Aho-Corasick nor the regex engine should ever be used.) Performance here should stay the same.

On that note, I have a new nightly-only algorithm called Teddy that uses SIMD[1] (which replaces the use of Aho-Corasick for those regexes) and is a bit faster. I got the algorithm from the Hyperscan[2] project, which also does extensive literal analysis to speed up regexes.

To clarify, this optimization is generally useful because a lot of regexes in the wild have prefix literals. Even something like `(?i:foo)\s+bar` can benefit from it, since `(?i:foo)` expands to FOO, FOo, FoO, Foo, fOO, fOo, foO, foo, which can of course be used with Aho-Corasick (and also my new SIMD algorithm).

One also must wonder how well a C program using PCRE2's JIT would fair on the benchmarks game. From my experience, it would probably be near the top. It's quite fast!

[1] - https://github.com/rust-lang-nursery/regex/blob/master/src/s...

[2] - https://github.com/01org/hyperscan

> One also must wonder how well a C program using PCRE2's JIT would fair on the benchmarks game.

Let's hope some C and C++ programmers take up the challenge ;-)

Please don't point people to u64 -- it's no longer updated. (Note the rustc version.)
Slower by a tiny amount, and still faster than other C implementations. It's within the error box.

Also iirc there are improvements to those benchmarks in the pipeline, idk what happened to them (Veedrac and llogiq had something in mind).

Sure, you could hand-translate C in many cases (not regex), but that would be far from idiomatic. Most of the rust solutions try to still look Rust-y.

Regarding sse, if you care about performance and sse use a nightly compiler. That option exists. Rust nightly is still Rust.

You can also just bundle Rust w/ LLVM and have it JIT compile your application on start up which'll yield huge performance gains too.

But people may get salty about binary image size.