Hacker News new | ask | show | jobs
by titzer 1302 days ago
> I note with fear and horror that even in 1980, language designers and users have not learned this lesson. In any respectable branch of engineering, failure to observe such elementary precautions would have long been against the law.

Here we are, 42 years later, and bounds checks are still not the default in some languages. Because performance, or something. And our computers are literally 1000x as fast as they were in 1980. So instead of paying 2% in bounds checks and getting a merge 980x faster, we get 2-3x more CVEs, costing the economy billions upon billions of dollars a year.

1 comments

Removing bounds checks is a stark example of a premature optimization.

You can remove bounds checks when you can prove that the index won't ever get out of bounds; this is possible in many cases, such as iteration with known bounds.

Isn't this a job for the compiler? The default would be to have boundary checking, but if the compiler can prove that the index is always in range it can drop the boundary check. From the user's perspective, the boundary check is always there. Most vector operations would have provable boundaries.

Edit:

Based on the benchmark code linked to in dahfizz 's comment, it seems that Rust does the above for vectors, but there are situations where the bounds can't be proved, so boundary checking can't be removed. How common is this case in practice?

To omit the check, the compiler would need to know that the loop range matches or subtends the array bound. That is commonly easy for built-in arrays, uncommonly for user-defined types. Most types are user-defined.

We trust the library author to get it right, despite (in Rust) wrapping accesses in "unsafe" or (in C++) not. Compilers are not particularly better at everything than library authors.

> would need to know that the loop range matches or subtends the array bound

Some compilers have pretty sophisticated analyses aimed at just that: determining affine relations to statically bound indexed accesses. Failing that, some compilers will resort to loop versioning, generating two versions of the loop and then partitioning the iteration space into the definitely-in-bounds range from possibly-out-of-bounds range, then selecting which portions of which loop to execute by prefixing both with dynamic checks. Couple all of that with peeling and unrolling, and bounds checks start disappearing and getting amortized away.

Libraries can do this too, in many cases more reliably.
Unless libraries are receiving a copy of the meta representation of the program and running integer equality relations over the dataflow chains, then no, not really.
This is the approach used by SPARK Ada. [0]

The norms there, from what I gather, are that you compile with runtime checks enabled unless you've used the SPARK prover tools to verify the absence of runtime errors, in which case you can safely disable runtime checks in your builds.

[0] https://docs.adacore.com/spark2014-docs/html/ug/en/usage_sce...