Hacker News new | ask | show | jobs
by Jach 1302 days ago
Always amuses me that it's current year and people think about turning off checks, even when they're pretty much free in modern* (since 1993 Pentium, which got like 80% accuracy with its primitive branch prediction?) CPUs...

"Around Easter 1961, a course on ALGOL 60 was offered … After the ALGOL course in Brighton, Roger Cook was driving me and my colleagues back to London when he suddenly asked, "Instead of designing a new language, why don't we just implement ALGOL60?" We all instantly agreed--in retrospect, a very lucky decision for me. But we knew we did not have the skill or experience at that time to implement the whole language, so I was commissioned to design a modest subset. In that design I adopted certain basic principles which I believe to be as valid today as they were then.

"(1) The first principle was security: The principle that every syntactically incorrect program should be rejected by the compiler and that every syntactically correct program should give a result or an error message that was predictable and comprehensible in terms of the source language program itself. Thus no core dumps should ever be necessary. It was logically impossible for any source language program to cause the computer to run wild, either at compile time or at run time. A consequence of this principle is that every occurrence of every subscript of every subscripted variable was on every occasion checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interests of efficiency on production runs. Unanimously, they urged us not to -- they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous. I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law."

-Tony Hoare, 1980 Turing Award Lecture (https://www.cs.fsu.edu/~engelen/courses/COP4610/hoare.pdf)

4 comments

> I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law.

Here we are, 42 years later, and bounds checks are still not the default in some languages. Because performance, or something. And our computers are literally 1000x as fast as they were in 1980. So instead of paying 2% in bounds checks and getting a merge 980x faster, we get 2-3x more CVEs, costing the economy billions upon billions of dollars a year.

Removing bounds checks is a stark example of a premature optimization.

You can remove bounds checks when you can prove that the index won't ever get out of bounds; this is possible in many cases, such as iteration with known bounds.

Isn't this a job for the compiler? The default would be to have boundary checking, but if the compiler can prove that the index is always in range it can drop the boundary check. From the user's perspective, the boundary check is always there. Most vector operations would have provable boundaries.

Edit:

Based on the benchmark code linked to in dahfizz 's comment, it seems that Rust does the above for vectors, but there are situations where the bounds can't be proved, so boundary checking can't be removed. How common is this case in practice?

To omit the check, the compiler would need to know that the loop range matches or subtends the array bound. That is commonly easy for built-in arrays, uncommonly for user-defined types. Most types are user-defined.

We trust the library author to get it right, despite (in Rust) wrapping accesses in "unsafe" or (in C++) not. Compilers are not particularly better at everything than library authors.

> would need to know that the loop range matches or subtends the array bound

Some compilers have pretty sophisticated analyses aimed at just that: determining affine relations to statically bound indexed accesses. Failing that, some compilers will resort to loop versioning, generating two versions of the loop and then partitioning the iteration space into the definitely-in-bounds range from possibly-out-of-bounds range, then selecting which portions of which loop to execute by prefixing both with dynamic checks. Couple all of that with peeling and unrolling, and bounds checks start disappearing and getting amortized away.

Libraries can do this too, in many cases more reliably.
This is the approach used by SPARK Ada. [0]

The norms there, from what I gather, are that you compile with runtime checks enabled unless you've used the SPARK prover tools to verify the absence of runtime errors, in which case you can safely disable runtime checks in your builds.

[0] https://docs.adacore.com/spark2014-docs/html/ug/en/usage_sce...

They’re nowhere near free. Branch prediction table has finite entries, instruction cache has finite size, autovectorizing is broken by bounds checks, inlining (the most important optimization) doesn’t trigger if functions are too big because of the added bounds checking code, etc. This is just not great benchmarking — no effort to control for noise.
> autovectorizing is broken by bounds checks

This is the big one. You pay a 50% penalty for actual CPU bound, iteration heavy code with bounds checking enabled.

https://github.com/matklad/bounds-check-cost

The proper way of addressing that is to manually hoist bound checks out of "hot" loops. Not just remove them altogether.
This should be the article.

Running this with 1.65 on an Intel 12400 gets a nearly 4x speedup when bounds checking is not needed. Just wow.

Bounds checking avoidance is important when it becomes a significant chunk of your hot-path.

For real programs, you should demand that the compiler hoist such checks out of the loop, which may then be vectorized the usual way.

If the compiler can't do that by itself, a library should do it.

The real issue is whether the information about the true size of the memory region involved is available at the point where it is needed. This may come down to how good the language is at capturing desired semantics in a library. Rust still has a long way to go to catch up with C++ on this axis, and C++ is not waiting around.

Rust claims responsibility for enforcing safety in the compiler, with libraries using "unsafe" to delegate some of that to themselves. Users then trust the compiler and libraries to get it right. In C++, the compiler provides base semantics while libraries take up the whole responsibility for safety. Users can trust libraries similarly as in Rust, to similar effect.

Modern C++ code typically does no visible operations with pointers at all, and most often does not index directly in arrays, preferring range notation, as in Rust, achieving correctness by construction. A correct program is implicitly a safe program.

> This may come down to how good the language is at capturing desired semantics in a library. Rust still has a long way to go to catch up with C++ on this axis, and C++ is not waiting around.

What catch up does Rust need to do?

Rust has slice that know the size of its data built in the language, while C++ doesn't. And Rust has stricter const and mutability rules that facilitates optimizations.

As for the implementation, Rust use LLVM which is also the backend used by one of the popular C++ compiler.

I am talking about language features that library authors can use to capture and express semantics in their libraries... but only if the language implements those features. C++ just has a lot more of them.
Like what, for example? To the contrary, I think that, other than constness, C++ has rather few facilities to communicate semantic invariants to the compiler.
And event const can't in general be used for optimizations (because there can be another reference to the same location, or one can just const_cast)
> For real programs, you should demand that the compiler hoist such checks out of the loop, which may then be vectorized the usual way.

LLVM sometimes does this, but when it doesn't, you may insert asserts to guide the optimizer, as explained here https://news.ycombinator.com/item?id=33808853

I think this technique works in C and C++ too (if you use clang or gcc)

Sometimes a __builtin_assert(c) may help (which is not the same as the normal assertion, which won't). Other times, you need to make a private copy of a value that the compiler could not otherwise assume will not be clobbered.
Unfortunely I only see Modern C++ on C++ conference talks and on my hobby projects.

Most of the stuff I see at work, is quite far from this ideal reality, starting with Android's codebase, or the various ways C++ gets used in Microsoft frameworks.

There are choices for places to work. Maybe try another one?
At the risk of moving the goalposts: so what? The vast majority of applications running out there would not be impacted meaningfully in the least by taking that performance hit.

Bounds checking should be the default, and then only when someone has proved through benchmarking and profiling that it's actually a problem for their application, should they even consider turning it off.

Bounds checks are the easiest type of code to branch predict. You just assume they never trigger, suddenly you have a 99.99% hit rate on them. When they trigger you don't care about the branch misprediction at all because the program is already busted and security is more important.
Yeah, I didn’t find it compelling either.

If your conclusion is “no signal, just noise” boost the input until the signal becomes apparent. If that means writing such a massive loop that the program takes an hour to run, fine.

I have removed signed-integer based values bounds checking in a compiler once and before I noticed, I got a nice 3.8% performance gain in a large diverse benchmarking suite. While not expensive, bounds checks are certainly not free.
And that's the thing, really. Sure, they're not free, but they're pretty cheap. In a project like OpenSSL, if C had bounds checking, it should be enabled, all the time. Sure, disable it for some random bit of custom high-perf software where you really need to eke out the last few percent of performance. But for 99% of everything else, leave the bounds checking turned on.
> Sure, disable it for some random bit of custom high-perf software where you really need to eke out the last few percent of performance.

And that's the rub, isn't it? Getting that last bit of perf in the inner hot loop has traditionally often required people to write the entire thing in languages that are unsafe (ffi overhead often being enough that you can't just wrap up the hot loop and call the bit that needs to be performant from elsewhere).

I wonder how much good interop could get us.

I don't know, because if you're writing programs for things like IoT or embedded, then you're dealing with nothing but wimpy little low power processors where removing a bounds check gives you huge increases. Then, it makes no sense to have checks by default if you're going to get rid of them anyway.
I'll just point out that a wimpy low power processor that costs less than a buck has as much processing power as a 486. And the amount of processing tends to be on small amounts of data in short bursts.

Big problem as I see it is compiler writers using C++ with it's completely broken language and compilation model. Of course they think speed is the only important metric because C++ compilers compiling C++ compilers is very very slow. And they're dealing with trusted data 100% of the time and not the guys desperately trying to patch a zero day vulnerability at 2am.

if it’s a 4% gain. then it’s the same 4% game no matter if running on a big fast cpu or a 8bit micro controller. right?
It isn't because those microcontrollers are not that smart about branch prediction and instruction ordering so the penalty is often bigger. Same with reordering - might not matter that much on a big modern CPU but may matter way more on a device with a micro cache and very slow division for example.
If you want to experience what using a "safe" language looks like, complete with bounds checking amongst other things, there's the JavaScript ecosystem.

...which has evolved its own, much worse, horrors instead.

Before that, there was Java.

Do not want.