Hacker News new | ask | show | jobs
by jcranmer 755 days ago
> The thing is, wrap-around is not only well-defined, it's common, and EXPECTED.

No, it's really not. Do this experiment: for the next ten thousand lines of code you right, every time you do an integer arithmetic operation, ask yourself if the code would be correct if it wrapped around. I would be shocked if the answer was "yes" in as much as 1% of the time.

(The most recent arithmetic expression I wrote was summing up statistics counters. Wraparound is most definitely not correct in that scenario! Actually, I suspect saturation behavior would be more often correct than wraparound behavior.)

This is a case where I think Linus is 100% wrong. Integer overflow is frequently a problem, and demanding the compiler only check for it in cases where it's wrong amounts to demanding the compiler read the programmer's mind (which goes about as well as you'd expect). Taint tracking is also not a viable solution, as anyone who has implemented taint tracking for overflow checks is well aware.

1 comments

It depends heavily on context.

For the kernel, which deals with a lot of device drivers, ring buffers, and hashes, wraparound is often what you want. The same is likely to be true for things like microcontroller firmware and such.

In data analysis or monte carlo simulations, it's very rarely what you want, indeed.

Is it really?

For example, I opened up https://elixir.bootlin.com/linux/latest/source/drivers/firew... as a random source file in the Linux kernel, and I didn't see a single line where wraparound would be correct behavior.

There are definitely cases where wraparound behavior is correct. There are also cases hard errors on overflow isn't desirable (say, statistics counters), but it's still hard to call wraparound the correct behavior (e.g., saturation would probably work better for statistics than wraparound). There are also cases where you could probably prove that overflow can't happen. But if you made the default behavior a squawk that wraparound occurred, and instead made developers annotate all the cases where that was desirable to silence the squawk, even in the entire Linux kernel, I'd suspect you'd end up with fewer than 1000 places.

This is sort of the point of the exercise--wraparound behavior is often what you want when you think about overflow, but you actually spend so much of your time not thinking about it that you miss how frequently wraparound behavior isn't what you wanted.

I think wraparound generally is better for statistics counters like the ones in the linked code, since often you want to check the number of packets/errors per some time interval, which you can do with overflow (as long as the time interval isn't so long that you overflow within a period) but not with saturation.