Hacker News new | ask | show | jobs
by tropo 2251 days ago
C has strayed very far from the original intent because compiler authors prioritized benchmark results at the expense of real-world use cases. This bad trend needs to be reversed.

Consider signed integer overflow.

The intent wasn't that the compiler could generate nonsense code if the programmer overflowed an integer. The intent was the the programmer could determine what would happen by reading the hardware manual. You'd wrap around if the hardware naturally would do so. On some other hardware you might get saturation or an exception.

In other words, all modern computers should wrap. That includes x86, ARM, Power, Alpha, Itanium, SPARC, and just about everything else. I don't believe you can even buy non-wrapping hardware with a C99 or newer compiler. Since this is likely to remain true, there is no longer any justification for retaining undefined behavior that is getting abused to the detriment of C users.

2 comments

There are some add-with-saturation opcodes in 8bit-element-size SIMD ISAs, I think that includes x86_64, some recent Nvidia GPUs, and the Raspberry Pi 1's VideoCore IV's strange 2D-register-file vector unit made for implementing stuff like VP8/H.264 on it. They are afaik always opt-in, though.
If most C developers wanted to trade the performance they get from the compiler being able to assume `n+1 > n` for signed integer n, it would happen.
Most of the useful optimizations that could be facilitated by treating integer overflow as jump the rails optimization could be facilitated just as well by allowing implementations to behave as though integers may sometimes, non-deterministically, be capable of holding values outside their range. If integer computations are guaranteed never to have side effects beyond yielding "weird" values, programs that exploit that guarantee may be processed to more efficient machine code than those which must avoid integer overflow at all costs.
How is this better behavior?
Many programs are subject to two constraints:

1. Behave usefully when practical, if given valid data.

2. Do not behave intolerably, even when given maliciously crafted data.

For a program to be considered usable, point #1 may be sometimes be negotiable (e.g. when given an input file which, while valid, is too big for the available memory). Point #2, however, should be considered non-negotiable.

If integer calculations that overflow are allowed to behave in loosely-defined fashion, that will often be sufficient to allow programs to meet requirement #2 without the need for any source or machine code to control the effects of overflow. If programmers have to take explicit control over the effects of overflow, however, that will prevent compilers from making of the any useful overflow-related options that would be consistent with loosely-defined behavior.

Under the kind of model I have in mind, a compiler would be allowed to treat temporary integer objects as being capable of holding values outside the range of their types, which would allow a compiler to optimize e.g. x*y/y to x, or x+y>y to x>0, but the effects of overflow would be limited to the computation of potentially weird values. If a program would meet requirements regardless of what values a temporary integer object holds, allowing such objects to acquire such weird values may be more efficient than requiring that programs write code to prevent computation of such values.

Intolerable is too situation specific.

Integer overflows that yield "weird values" in one place can easily lead to disasterous bugs in another place. So the safest thing in general would be to abort on integer overflow. But I'm sure there are applications where that, too, is intolerable. Kinda hard to have constraint 2 then.

Having a program behave in unreliably uselessly unpredictable fashion can only be tolerable in cases where nothing the program would be capable of doing would be intolerable. Such situations exist, but they are rare.

Otherwise, the question of what behaviors would be tolerable or intolerable is something programmers should know, but implementations cannot. If implementations offer loose behavioral guarantees, programmers can determine if they meet requirements. If an implementation offers no guarantees whatsoever, however, that is not possible.

If the only thing about overflow is that temporary values may hold weird results, and if certain operations upon a "weird" result (e.g. assignment to anything other than an automatic object whose address is never taken) will coerce it into a possibly-partially-unspecified number within type's range, then a program may ensure that behavior will be acceptable regardless of what weird values result from computation.

According to the published Rationale, the authors of C89 would have expected that something like:

    unsigned mul(unsigned short x, unsigned short y)
    { return (x*y); }
would on most implementations yield an arithmetically-correct result even for values of (x*y) between INT_MAX+1U and UINT_MAX. Indeed, I rather doubt they could imagine any compiler for a modern system would do anything other than yield an arithmetically-correct result or--maybe--raise a signal or terminate the program. In some cases, however, that exact function will disrupt the behavior of its caller in nonsensical fashion. Do you think such behavior is consistent with the C89 Committee's intention as expressed in the Rationale?
It lets you check that a+b > a for unknown unsigned b or signed b known > 0, to make sure addition didn’t overflow. I’m rather certain all modern C compilers will optimize that check out.