Hacker News new | ask | show | jobs
by e12e 3644 days ago
From the linked [lwn] article:

"Another example he gave demonstrates the slowness of the C runtime:

    import itertools
    sum(itertools.repeat(1.0, 100000000))
That will calculate the sum of 100 million 1.0s. But it is six times slower than the equivalent JavaScript loop. Float addition is fast, as is sum(), but the result is not.

Larry Hastings asked what it was that was slowing everything down. Modzelewski replied that it is the boxing of the numbers, which requires allocations for creating objects. Though an audience member did point out with a chuckle that you can increase the number of iterations and Python will still give the right answer, while JavaScript will not."

Reminded me about the excellent talk about Julia: "Julia: to Lisp or not to Lisp?" https://www.youtube.com/watch?v=dK3zRXhrFZY

One things he points out early is that both the C99 and the R6RS Scheme spec is 20% numerical. Because correct and (reasonably) fast numbers and arithmetic is actually pretty hard to get right on a computer - if you want to abstract away hardware "short-cuts" and allow for precise arithmetic by default.

It will be interesting to see how much type hints (eliminating some of the boxing/unboxing) will help python. And if it turns out to really be a good fit for the language -- everyone wants "free" performance, but transitioning to a (even partially) typed language is certainly not "free".

Another point with the above loop, is that ideally, even if it can't be optimized/memoized down to a constant - it really shouldn't have to be much slower than its C counterpart. Except for handling bignums in some way or other (perhaps on overflow only).

[lwn] https://lwn.net/Articles/691243/

2 comments

> Another point with the above loop, is that ideally, even if it can't be optimized/memoized down to a constant - it really shouldn't have to be much slower than its C counterpart. Except for handling bignums in some way or other (perhaps on overflow only).

Sadly, bignum handling is very expensive even with type hinting. Essentially, every time two numbers is added or subtracted you have to check for overflow:

    int z = hw_add(x, y);
    if (over/under-flowed?()) {
      bignum* bn = new bignum();
      bignum_add(to_bignum(x), to_bignum(y), &bn);
      return bn;
    }
    return z;
So most modern statically typed languages (none named, none forgotten!) actually cheat and use modular arithmetic instead. For example, the straightforward way of computing 9223372036854775807 + 1 on a 64bit system in one of those languages is likely to yield an incorrect result.

Which imho is complete bullshit because there is no reason to sacrifice correctness for performance other than to win benchmarking contests.

This overflow check is extremely cheap. Just check the overflow flag on the cpu. jo +4; jmp bignum; 6 byte. C compilers didn't support it until last year, so everyone just used inline assembly. Now we have it in gcc 5 and clang 3.4

One good trick started by lua is to use double for everything and check the mantissa of the normalized result for overflows. A double can represent the whole int32 range. With double and nan-tagging you have a unique fast representation of all primitives and can use a double word stack. Which is the biggest win. Look at how python, perl and ruby are doing this. Horrible. And then look how a fast dynamic language is doing it.

Everything is relative. :) Compared to the add instruction itself, the overflow check is very expensive. But the big problem is that your types widen. E.g:

    // x and y are hardware integers >= 0
    var z = x + y
    for (var i = 0; i < z; i++) {
        ...
    }
If your language used modular arithmetic, 'z' would also be a hardware integer and the induction variable 'i' could be kept in a cpu register. But with real arithmetic using bignums you can't assume that anymore and each iteration of the loop must do a relatively expensive comparison of 'i' against 'z'. In pseudo code:

    var z = x + y
    var i = 0
    if (!hw_int(z)) i = new_bignum(0)
    while (true) {
      // i < z
      if (hw_ints?(i, z)) {
        if (i >= z) break
      } else {        
        if (bignum_gte(i, z)) break
      }        
      ... body ...
      // i++
      if (hw_int?(i)) {
        i++;
      } else {
        i = bignum_add(i, 1)
      }
    }
No, it's super cheap. overflows happen <1%, so you mark it as UNLIKELY, and the branch prediction units just don't care about all those bignum branches.

If you know to use bigints or bignums, just type it and use their variants from the beginning.

Most better dynamic languages already use tagged ints, so they have only say ~30bits for an int and do a lot of those overflow checks and still beat the hell out of the poorly designed dynamic languages with full boxed ints or nums.

My understanding is that there is no plan to use type hints for optimization. I'm on a phone so I can't find it now, but I previous talked to some people on here about it in comments.
Please let us know if you find a reference to that - I've not seen anything either way (other than perhaps eg: cython piggybacking on the syntax for its own use - but not something strictly speaking "in (c)python". That said, according to: https://www.python.org/dev/peps/pep-0484/#rationale-and-goal...

"This PEP aims to provide a standard syntax for type annotations, opening up Python code to easier static analysis and refactoring, potential runtime type checking, and (perhaps, in some contexts) code generation utilizing type information.

Of these goals, static analysis is the most important. This includes support for off-line type checkers such as mypy, as well as providing a standard notation that can be used by IDEs for code completion and refactoring."

So optimization doesn't appear to be a non-goal - as much as not a primary goal?

I did say "no plan", not "never gonna happen." I don't think even Guido could really promise the latter.

I went back and looked, and this is the best reference I could find: https://news.ycombinator.com/item?id=9847740. It's not much more informative than the present discussion, and the best I can say is that I don't know anything about efforts to make cpython use the annotations for any changes in runtime behavior (either through code generation, or at runtime).

Optimization is only a distant goal according to Guido---I think he said so in the PEP or the dev mailing list discussion thereof.