Hacker News new | ask | show | jobs
by iscoelho 887 days ago
Here's an example of the same issue in Rust: https://dropbox.tech/infrastructure/lossless-compression-wit...

(with bounds checks) "Telling the Rust allocator to avoid zeroing the memory when allocating for Brotli improves the speed to 224 MB/s."

(without bounds checks) "Activating unsafe mode results in another gain, bringing the total speed up to 249MB/s, bringing Brotli to within 82% of the C code."

224MB/s -> 249MB/s (11% Brotli compression perf difference just by eliminating bounds checks)

2 comments

Couple issues with the linked article, the rust compiler has matured a lot since it was written I didn’t look at the benchmark or the code, but many bounce checks are elided if you use iterators instead of array access and lastly, I would want to see ZSTD benchmark written in normative rust.

I don’t doubt that all of those things in the article were true back then, but that was eight years ago. Wow.

Sure, you're not wrong, but my focus was that the benchmark is indicative of the raw performance impact of bounds checks (when they are unable to elided) in a real world algorithm (as opposed to a micro-benchmark).

With that said, convincing a compiler to elide bounds checks (especially Java's JIT compiler) is a hugely frustrating (and for some algorithms futile) task.

It could be an argument that bounds checks make up a small percentage of total application performance. However, I've profiled production Java servers where >50% of the CPU was encryption/compression. JDK implementations of those algorithms are heavily impacted by (and commonly fail to elide) bounds checks.

Performance matters!

>With that said, convincing a compiler to elide bounds checks (especially Java's JIT compiler) is a hugely frustrating (and for some algorithms futile) task.

Adding explicit checks does work to a certain degree but it can change with the compiler, and it requires to keep checking the generated assembly - not fun (no unsafe, either but still)

These explicit checks are basically what I mean by "convincing the compiler" (and sometimes, it isn't quite convinced!) - Yeah. Not fun at all.

Especially in Java, because "the assembly" can change as the JIT evolves. What is optimized today may not be tomorrow.

Some 12-13y back (time does fly) Cliff Click (hotspot architect) had a series of blogs on optimizations, incl. the lattice checks (i.e. within bounds). Was quite insightful. It was called: "Too Much Theory" [0]

>What is optimized today may not be tomorrow.

Exactly. (Also most developers will have exceptionally hard time maintaining such code)

[0]: https://web.archive.org/web/20120328222841/http://cliffc.org...

I understand where you are coming from, but without comparing the generated assembly, we are comparing an implementation of bounds checks. I think we should have a bounds check instruction and operates concurrently.

I should play around this with this using a couple RISC-V cores.

> Performance matters!

https://www.youtube.com/watch?v=r-TLSBdHe1A by Emery Berger

Another, yesbut, encryption and compression are and will handled by on die accelerators.

That's a good example, but much more inline with my expectations (not on the order of 125%)
Keep in mind that the "11%" applies to the fully implemented Brotli compression algorithm. Bounds-checks are a small part of that.

If you write a micro-benchmark for only bounds-checks, you'd see the larger difference more inline with the "125%"