| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by e12e 3643 days ago

Eh...

  cat<<eof > float.py
  import itertools
  s = sum(itertools.repeat(1.0, 100000000))
  print(s)

  $ time python float.py 
  100000000.0

  real    0m0.602s
  user    0m0.596s
  sys     0m0.004s

  time python3 float.py 
  100000000.0

  real    0m0.603s
  user    0m0.600s
  sys     0m0.000s

  $ time pypy float.py 
  100000000.0

  real    0m0.211s
  user    0m0.088s
  sys     0m0.004s

That's with no warmup for the pypy variant (or indeed the other python variants). Or, slightly more "robust":

   $ python -m timeit -s "import itertools as i" \
                 "sum(i.repeat(1.0, 100000000))"
  10 loops, best of 3: 594 msec per loop

  $ python3 -m timeit -s "import itertools as i" \
                 "sum(i.repeat(1.0, 100000000))"
  10 loops, best of 3: 592 msec per loop

  $ pypy -m timeit -s "import itertools as i" \
              "sum(i.repeat(1.0, 100000000))"
  10 loops, best of 3: 68.2 msec per loop

Pypy actually does pretty good here:

  $ cat float.cpp 
  #include<iostream>

  int main() {
    double s = 0;
    for (int i = 0; i < 100000000; ++i) {
        s++;
    }

    std::cout << s << std::endl;
    return 0;
  }

  $ g++ --std=c++14 -O3 float.cpp
  $ time ./float
  1e+08

  real    0m0.237s
  user    0m0.236s
  sys     0m0.000s

Note that the C++ code use a loop, not a lazy generator. Apparently they may be coming in c++17 as proposal N4286.

3 comments

jerf 3643 days ago

Summing a list of numbers is easy mode for a JIT. You've got a tight loop with one type that can be statically shown will never be violated in real-time. Unfortunately, unless that's actually your workload, the speed with with a JIT-based system can add numbers is not relevant to how fast it runs in practice. Any JIT that can't tie C on that workload is broken somehow.

Personally, I think people often go quite overboard with the "benchmarks are useless" idea, but this benchmark really is useless, because it will never produce any differences betweens JITs and thus can't show whether one is good or bad.

link

chrisseaton 3643 days ago

> it will never produce any differences betweens JITs and thus can't show whether one is good or bad

It can tell you which JITs can't even manage to remove the loop, which is useful to know.

link

e12e 3643 days ago

Apparently neither cpython, pypy or gcc manage to remove the loop in this case. I actually think it is interesting that this "slow" code in cpython is within [ed: ~10x] of pypy/jit/machine code (c++ probably should do better, I'm not all that familiar with gcc - maybe -O3 isn't enough to try to unroll loops and/or try to vectorize).

Actually code like this arguably should be a win for a high-level language with an optimization pass; ideally the whole thing should be translated to a constant at compile-time.

link

chrisseaton 3643 days ago

Ah right I think that's because the accumulator is a double. I missed that. I think it should still be possible but compilers probably don't bother.

link

jerf 3643 days ago

That I don't necessarily expect from a JIT in real time. I'd expect it from any half-decent optimizing compiler, but I'd expect it to likely be the result of several interacting and too-expensive-for-real-time optimizations.

I mean, if the JIT works that out, great, and if someone wants to show off performance numbers that shows one can do that I'm interested in the information, but I wouldn't in general discard one for failing to notice that optimization.

link

brianwawok 3643 days ago

Real life coding is not a loop.

Try that on a Websever or an image processing box.

link

honkhonkpants 3643 days ago

I have a feeling 100% of the c++ time is being spent in some silliness like setting up the locale of the ostream, because my compiler totally eliminates that loop.

link

e12e 3643 days ago

Probably. I glanced at the asm to make sure the loop was still there (which it was, possibly because it loops over an int, and sums doubles?) and couldn't see anything that stood out. Still a little surprised that pypy without warmup is faster than c++ for this silly thing.

[ed: On this system, eliminating the loop by hand looks like:

  #include<iostream>
  int main() {
    double s = 100000000;

    std::cout << s << std::endl;
    return 0;
  }

  $ g++ --std=c++14 -O3 float2.cpp -o float2 \
    && time ./float2
  1e+08

  real    0m0.001s
  user    0m0.000s
  sys     0m0.000s

Just for completeness.]

link