| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by MaxBarraclough 1549 days ago

> Valgrind offers bounds checks with malloc() allocations and probably other allocations

Right, and Valgrind is very impressive, but it's a very intrusive and heavyweight tool, rather than a compiler flag. (I often mention Valgrind by name in these sorts of discussions, [0] we're thinking along the same lines here.)

> I imagine runtime checks could be made possible for any allocator by offering a slice(ptr, type, count) builtin call

It would presumably need 'fat pointers', which have ABI issues.

It would need to cope with statically allocated arrays, heap-allocated arrays, and arrays with automatic lifetime, including VLAs and alloca. It would also need to cope with passing a pointer mid-way into an array (something forbidden in, say, Java).

There would be considerable payoff to solving this problem, but it isn't an easy problem to solve, which is why no compiler does so.

> I don't really see the language causing any problems, it's more what compiler output and optimizations we've come to expect.

No, it's the language. The behaviour of modern optimising compilers is consistent with the way the languages is defined. The way the language is defined, and the categories of undefined behaviour that it permits, are very consequential.

I already gave the example of how arrays work differently in Ada, such that Ada compilers can easily add bounds checks for dev builds, whereas C compilers cannot so you need Valgrind, which few people use. (Well, more properly, arrays work differently in C than in just about every other language.) I could give a dozen other examples of ways C permits you to introduce serious bugs into your codebase which other languages robustly defend against.

Again, many categories of bug simply cannot occur in other safer languages, because of the way safe languages are defined. Use-after-free, double-free, signed integer overflow, read-before-write, divide-by-zero, out-of-bound access, mis-aligned access, data-races. Using a language like Safe Rust closes the door on every one of those kinds of undefined behaviour.

Recall that these are precisely the kinds of errors that result in major security vulnerabilities. This isn't purely academic. Safe languages like Safe Rust (or even plain old Java, although Java has some unsafe corners) stops those issues arising, and that means better security.

Also, as we've discussed, the way C is defined makes it difficult for C compilers to use robust compile-time or even run-time checks to detect undefined behaviour. The result is that major C codebases get delivered with undefined behaviour bugs, which are a common source of major security vulnerabilities.

> I'm not saying you have to like the outcome but this is not "UB" for me - in the sense that it's been utterly obvious from day 1 that this type of issue can cause corruption (because how would it not?), long before there was any talk about UB.

Respectfully there is no such thing as 'UB for me'. It's an accepted technical term-of-art with a clear definition. People do PhDs on this topic. [1]

You're not engaging with the points I've made about how other languages are defined in such a way as to prevent these issues arising. You seem to be focussing on how it's the programmer's fault, which isn't the point at all.

Also, operations which cause data-corruption in C aren't always this way in other languages. In Java, an out-of-bounds write (into an array) results in an exception being thrown, for instance. In verified SPARK Ada, out-of-bounds writes cannot arise in the first place.

> long before there was any talk about UB

Prior to the standardisation of the C language that may have been true in the sense that perhaps the term undefined behaviour had not yet been coined, but that doesn't have any bearing on our discussion. C was always an unsafe language.

> I think it's obvious that this sort of code is extremely likely to break

And yet it didn't occur to the experienced C++ programmer. The whole idea of non-POD types isn't obvious to a C++ programmer who started out with C.

To understand C or C++ well you need to essentially know the language spec. You can't simply learn by doing, as you may easily be tricked into thinking that bad code is correct and robust. It isn't obvious that unsigned integer addition overflows by wrapping whereas signed integer addition overflow causes undefined behaviour.

Google's Chrome team are unable to keep undefined behaviour out of their (necessarily large and complex) C++ codebase. It's unlikely that you're smarter than them. Even if you are, undefined behaviour continues to be a problem for real C/C++ codebases, resulting in a steady stream of security issues.

[0] https://news.ycombinator.com/item?id=30580138

[1] https://en.wikipedia.org/wiki/John_Regehr

1 comments

jstimpfle 1548 days ago

> Respectfully there is no such thing as 'UB for me'.

Arguing like that and then saying I'm not "engaging with the points you've made" after you've been talking completely aside the points of my OP, well... you're being a bit of a pain in the butt. I totally get your points, so let's agree that we're just looking for different things.

(Btw, a couple of days ago I did try Rust once again for a few hours, intending to convert a simple toy project to it. After a few hours of fighting the compiler, editing boilerplate files, looking for the right crates for basic Win32 interop, waiting for downloads, etc... I quit without making it work. Nothing changed in my feeling that without an extreme (or potentially infinite) investment of energy, C will continue to be more productive for me personally, for what I do - despite all its flaws).

link