Hacker News new | ask | show | jobs
by e12e 3643 days ago
Eh...

  cat<<eof > float.py
  import itertools
  s = sum(itertools.repeat(1.0, 100000000))
  print(s)

  $ time python float.py 
  100000000.0

  real    0m0.602s
  user    0m0.596s
  sys     0m0.004s

  time python3 float.py 
  100000000.0

  real    0m0.603s
  user    0m0.600s
  sys     0m0.000s

  $ time pypy float.py 
  100000000.0

  real    0m0.211s
  user    0m0.088s
  sys     0m0.004s
That's with no warmup for the pypy variant (or indeed the other python variants). Or, slightly more "robust":

   $ python -m timeit -s "import itertools as i" \
                 "sum(i.repeat(1.0, 100000000))"
  10 loops, best of 3: 594 msec per loop

  $ python3 -m timeit -s "import itertools as i" \
                 "sum(i.repeat(1.0, 100000000))"
  10 loops, best of 3: 592 msec per loop

  $ pypy -m timeit -s "import itertools as i" \
              "sum(i.repeat(1.0, 100000000))"
  10 loops, best of 3: 68.2 msec per loop
Pypy actually does pretty good here:

  $ cat float.cpp 
  #include<iostream>

  int main() {
    double s = 0;
    for (int i = 0; i < 100000000; ++i) {
        s++;
    }

    std::cout << s << std::endl;
    return 0;
  }

  $ g++ --std=c++14 -O3 float.cpp
  $ time ./float
  1e+08

  real    0m0.237s
  user    0m0.236s
  sys     0m0.000s
Note that the C++ code use a loop, not a lazy generator. Apparently they may be coming in c++17 as proposal N4286.
3 comments

Summing a list of numbers is easy mode for a JIT. You've got a tight loop with one type that can be statically shown will never be violated in real-time. Unfortunately, unless that's actually your workload, the speed with with a JIT-based system can add numbers is not relevant to how fast it runs in practice. Any JIT that can't tie C on that workload is broken somehow.

Personally, I think people often go quite overboard with the "benchmarks are useless" idea, but this benchmark really is useless, because it will never produce any differences betweens JITs and thus can't show whether one is good or bad.

> it will never produce any differences betweens JITs and thus can't show whether one is good or bad

It can tell you which JITs can't even manage to remove the loop, which is useful to know.

Apparently neither cpython, pypy or gcc manage to remove the loop in this case. I actually think it is interesting that this "slow" code in cpython is within [ed: ~10x] of pypy/jit/machine code (c++ probably should do better, I'm not all that familiar with gcc - maybe -O3 isn't enough to try to unroll loops and/or try to vectorize).

Actually code like this arguably should be a win for a high-level language with an optimization pass; ideally the whole thing should be translated to a constant at compile-time.

Ah right I think that's because the accumulator is a double. I missed that. I think it should still be possible but compilers probably don't bother.
That I don't necessarily expect from a JIT in real time. I'd expect it from any half-decent optimizing compiler, but I'd expect it to likely be the result of several interacting and too-expensive-for-real-time optimizations.

I mean, if the JIT works that out, great, and if someone wants to show off performance numbers that shows one can do that I'm interested in the information, but I wouldn't in general discard one for failing to notice that optimization.

Real life coding is not a loop.

Try that on a Websever or an image processing box.

I have a feeling 100% of the c++ time is being spent in some silliness like setting up the locale of the ostream, because my compiler totally eliminates that loop.
Probably. I glanced at the asm to make sure the loop was still there (which it was, possibly because it loops over an int, and sums doubles?) and couldn't see anything that stood out. Still a little surprised that pypy without warmup is faster than c++ for this silly thing.

[ed: On this system, eliminating the loop by hand looks like:

  #include<iostream>
  int main() {
    double s = 100000000;

    std::cout << s << std::endl;
    return 0;
  }

  $ g++ --std=c++14 -O3 float2.cpp -o float2 \
    && time ./float2
  1e+08

  real    0m0.001s
  user    0m0.000s
  sys     0m0.000s
Just for completeness.]