Hacker News new | ask | show | jobs
by gchpaco 6024 days ago
We have been awash in buffer overflows and other, similar errors (printf strings come to mind) that are actually impossible in a safer language for years. SQL injection can happen in a safer language but you can't take over the web server by doing them. There is nothing fundamental about system languages that requires unsafe array operations. This is a flaw, and it is a flaw of C specifically and a flaw inherited by many C-descended languages. This is not some ivory tower thing that was discovered after C was designed; it was apparent even at the time (although Pascal's fix was pretty bad, variable length arrays fix it neatly). There are compiler articles from the late 70s and early 80s pointing out how even a naïve compiler could easily optimize out bounds checking in most operations!
1 comments

If you design the software correctly then array bounds checking is often a waste of resources. For a stupid example let's assume you have 3 arrays of the same size and you are doing this.

  For (i = 1; i < 10000; i++)
  {
    a[i] = i * i;
    b[i] = a[i] * i;
    c[i] = b[i] * i; 
  }
Now that's not a lot of code but with array bounds checking you add 50,000 bounds checks that do nothing useful if the arrays are of the correct size. Clearly there are uses where those bounds checks are useful, but when you care about speed they can become fairly costly.

You might even want to rewrite it as because it really is faster:

  For (i = 1; i < 10000; i++)
  {
    c[i] = i * (b[i] = i * (a[i] = i)); 
  }
PS: Ugly c code often has a vary good reason for looking the way it does.
Those are precisely examples where even the most naive late 70s compiler can optimize out the bounds checking, as is well documented in the literature. (presuming the sizes are >= 10000, of course) All it takes is a validity range, noting that i goes from 1 to 9999 and that at each access i is within the range of the array. To trick out the compiler optimizations you need to at least start doing nontrivial mathematics on the bounds indices, which are also the ones that are the least obvious and thus in need of the bounds indices.
I would be impressed if that worked correctly on multithreaded code or if it could survive some of the more esoteric pointer manipulations you can do in C. It's mathematically impossible to make the perfect language for all problems, so while many languages have been built that are a "safe" version of C they lose something for everything they gain.

PS: Even compiler bugs can be useful under the correct circumstances.