Hacker News new | ask | show | jobs
by rogerbinns 4102 days ago
Note that the Python code is not limited to 32 bits and will automatically promote to bignum (aka long in py27).

    >>> l=[0xffffffff, 17]
    >>> sum(l) 
    4294967312
C based code will silently overflow/truncate, giving 16 in the example above. The author handwaved all this away, but it does show the usual tradeoffs between right/robust answers and fast answers.
2 comments

Signed integer behaviour is undefined on overflow in C.
But not on specific compilers with specific settings on specific machines like were used for the benchmarks results. Execution time is also undefined by the C standard.
"Undefined behavior" is a specific term used by the C and C++ standards. What you're describing sounds more like "unspecified behavior", something entirely different.

Undefined behavior allows for aggressive optimizations where the compiler assumes the integer overflow can never happen, leading to such wonderful bugs as this:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=30475

Some compilers may offer specific settings, such as fwrapv, which give you a way to define the behavior of integer overflow. (Although last I read fwrapv was broken.) I see no mention of any such specific setting in the article. If you've spotted one I've missed, please call it out more specifically.

Obviously if this weren't a trivial example and were real code he'd account for this in his C code, and it would still be much faster than the Python code.

He hand waved it away because it's not really relevant to the point.

The Python code was trivial to write and immediately correct and robust. Trying to get C code to do the right thing is considerably more difficult - for example even trying to detect overflow can result in code the compiler then removes - http://lwn.net/Articles/278137/

Robust, correct and fast are a hard combination to do. I do acknowledge the article is about the latter.

It's not hard if you use GMP.
The fancy library you're using in that other part of your project doesn't use GMP.
It likely doesn't need arbitrary precision arithmetic either. A lot of algorithms truly want modular arithmetic (hashes, checksums), many are dealing in domains that can't practically exceed a machine integer size (length of a string, number of vertices in a graph, etc). There is no reason that every library needs to deal in arbitrary precision.
> domains that can't practically exceed a machine integer size

This assumption is common among authors of C code, but is sometimes also exploitably incorrect. Even if you really don't care about supporting things larger than a certain size, you do still have to correctly account for overflow, not just ignore it.