Hacker News new | ask | show | jobs
by dataangel 1302 days ago
They’re nowhere near free. Branch prediction table has finite entries, instruction cache has finite size, autovectorizing is broken by bounds checks, inlining (the most important optimization) doesn’t trigger if functions are too big because of the added bounds checking code, etc. This is just not great benchmarking — no effort to control for noise.
5 comments

> autovectorizing is broken by bounds checks

This is the big one. You pay a 50% penalty for actual CPU bound, iteration heavy code with bounds checking enabled.

https://github.com/matklad/bounds-check-cost

The proper way of addressing that is to manually hoist bound checks out of "hot" loops. Not just remove them altogether.
This should be the article.

Running this with 1.65 on an Intel 12400 gets a nearly 4x speedup when bounds checking is not needed. Just wow.

Bounds checking avoidance is important when it becomes a significant chunk of your hot-path.

For real programs, you should demand that the compiler hoist such checks out of the loop, which may then be vectorized the usual way.

If the compiler can't do that by itself, a library should do it.

The real issue is whether the information about the true size of the memory region involved is available at the point where it is needed. This may come down to how good the language is at capturing desired semantics in a library. Rust still has a long way to go to catch up with C++ on this axis, and C++ is not waiting around.

Rust claims responsibility for enforcing safety in the compiler, with libraries using "unsafe" to delegate some of that to themselves. Users then trust the compiler and libraries to get it right. In C++, the compiler provides base semantics while libraries take up the whole responsibility for safety. Users can trust libraries similarly as in Rust, to similar effect.

Modern C++ code typically does no visible operations with pointers at all, and most often does not index directly in arrays, preferring range notation, as in Rust, achieving correctness by construction. A correct program is implicitly a safe program.

> This may come down to how good the language is at capturing desired semantics in a library. Rust still has a long way to go to catch up with C++ on this axis, and C++ is not waiting around.

What catch up does Rust need to do?

Rust has slice that know the size of its data built in the language, while C++ doesn't. And Rust has stricter const and mutability rules that facilitates optimizations.

As for the implementation, Rust use LLVM which is also the backend used by one of the popular C++ compiler.

I am talking about language features that library authors can use to capture and express semantics in their libraries... but only if the language implements those features. C++ just has a lot more of them.
Like what, for example? To the contrary, I think that, other than constness, C++ has rather few facilities to communicate semantic invariants to the compiler.
And event const can't in general be used for optimizations (because there can be another reference to the same location, or one can just const_cast)
If the thread you are on doesn't modify the variable (e.g. by const_cast), and that variable isn't atomic or volatile, the compiler should be allowed to treat it as invariant. Whether it does in practice probably depends on a lot of things though.
> For real programs, you should demand that the compiler hoist such checks out of the loop, which may then be vectorized the usual way.

LLVM sometimes does this, but when it doesn't, you may insert asserts to guide the optimizer, as explained here https://news.ycombinator.com/item?id=33808853

I think this technique works in C and C++ too (if you use clang or gcc)

Sometimes a __builtin_assert(c) may help (which is not the same as the normal assertion, which won't). Other times, you need to make a private copy of a value that the compiler could not otherwise assume will not be clobbered.
Unfortunely I only see Modern C++ on C++ conference talks and on my hobby projects.

Most of the stuff I see at work, is quite far from this ideal reality, starting with Android's codebase, or the various ways C++ gets used in Microsoft frameworks.

There are choices for places to work. Maybe try another one?
At the risk of moving the goalposts: so what? The vast majority of applications running out there would not be impacted meaningfully in the least by taking that performance hit.

Bounds checking should be the default, and then only when someone has proved through benchmarking and profiling that it's actually a problem for their application, should they even consider turning it off.

Bounds checks are the easiest type of code to branch predict. You just assume they never trigger, suddenly you have a 99.99% hit rate on them. When they trigger you don't care about the branch misprediction at all because the program is already busted and security is more important.
Yeah, I didn’t find it compelling either.

If your conclusion is “no signal, just noise” boost the input until the signal becomes apparent. If that means writing such a massive loop that the program takes an hour to run, fine.