| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Asooka 533 days ago

It would be better if compilers were fixed. Current treatment of UB is insane. The default should be no UB and a pragma that lets you turn it on for specific parts. That used to be the case and some of the world's most performant software was written for compilers that didn't assume no pointer aliasing nor no integer overflow (case in point: Quake).

The big problem, apart from "memcpy is a bit tedious to write", is that it is impossible to guarantee absence of UB in a large program. You will get inlined functions from random headers and suddenly someone somewhere is "dereferencing" a pointer by storing it as a reference, and then your null pointer checks are deleted. Or an integer overflows and your "i>=0" array bounds assert is deleted. I have seen that happen several times and each time the various bits of code all look innocent enough until you sit down and thoroughly read each function and the functions it calls and the functions they call. So we compile with the worst UB turned off (null is a valid address, integers can overflow, pointers can alias, enums can store any integer) and honestly, we cannot see a measurable difference in performance.

UB as it is currently implemented is simply an enormous footgun. It would be a lot more useful if there were a profiler that could find parts of the code, which would benefit from UB optimisations. Then we could focus on making those parts UB-safe, add asserts for the debug/testing builds, and turn on more optimisations for them. I am quite certain nobody can write a fully UB-free program. For example, did you know that multiplying two "unsigned short" variables is UB? Can you guarantee that across all the template instantiations for custom vector types you've written you never multiply unsigned shorts? I'll leave it as an exercise to the reader as to why that is.

2 comments

dzaima 532 days ago

Strict aliasing in particular is very dependent on coding style; I too just turn it off with essentially no harm, but certain styles of code that don't involve carefully manually CSEing everything might benefit a good bit (though with it being purely type-based imo it's not that useful).

Outside of freestanding/kernel scenarios, null treatment shouldn't affect anything, and again can similarly benefit probably a decent amount of code by removing what is unquestionably just dead code. There's the question of pointer addition resulting in/having an argument of null being UB, which is rather bad for offsetof shenanigans or misusing pointers as arbitrary data, but that's a question of provenance, not anything about null.

Global integer overflow UB is probably the most questionable thing in that list, and I'd quite prefer having separate wrap-on-overflow and UB-on-overflow types (allowing having unsigned UB-on-overflow too). That said, even having had multiple cases of code that could hit integer overflow and thus UB, I don't recall any of them actually resulting in the compiler breaking things (granted, I know not to write a+1<a & similar for signed ints), whereas I have had a case where the compiler "fixed" the code, turning a `a-b < 0` into the more correct (at least in the scenario in question) `a<b`!

I do think that it would make sense to have an easy uniform way to choose some specific behavior for most kinds of UB.

link

imtringued 533 days ago

There is no point in talking with the people that worship UB for the sake of UB.

They don't understand that one of the biggest barriers to developers writing and adopting more C in their projects is the random jankiness that you get from the compilers. Instead they make C this elite thing for the few people who have read every single line of C code they themselves compiled and ran on their Gentoo installation. Stuff like having no bounds checks is almost entirely irrelevant outside of compute kernels. It doesn't get you much performance, because the branches are perfectly predictable. It merely reduces code size.

There is also the problem that the C development culture is uttlery backwards compared even to the semiconductor industry. If you want to have these ultra optimized release builds, then your development builds must scream when they encounter UB and it also means that no C program or library without an extensive test suite should be allowed onto any Linux distribution's package repository. Suddenly the cost of C programming appears to be unaffordably high.

link

ncruces 532 days ago

> Stuff like having no bounds checks is almost entirely irrelevant outside of compute kernels. It doesn't get you much performance, because the branches are perfectly predictable. It merely reduces code size.

This is just not true, or languages with bounds checks wouldn't invest so much in bounds checks elimination (both automatically, in the compiler, and by programmers, leaving hints to the compiler so it can remove bounds checks).

link