Hacker News new | ask | show | jobs
by insertion 4194 days ago
I noticed that PyPy does not show any real speed improvement over CPython in these benchmarks: https://www.techempower.com/benchmarks/

If you filter by Python, you can compare some results running on both CPython and PyPy. I would be curious to know what it is about these benchmarks that makes PyPy perform poorly. I would also be interested to see how Nuitka performs.

At the moment I'm also very excited about Pyston from Dropbox: https://github.com/dropbox/pyston

3 comments

I'm supremely sceptical about those benchmarks. Go take a look at the code being tested. I would welcome a serious look at developmental time versus resource used, using code that is probable in production. I read the code used to test Django. It's not reasonable code.
It's an open sourced benchmarking comparison. If you can improve them, shoot them a pull request!

https://github.com/TechEmpower/FrameworkBenchmarks

I think pypy is of interest because of pypy-stm that attempts to circumvent the GIL. Not only for speed benefits.
Correct me if I'm wrong here, but isn't that only helpful if you have multithreaded python programs? I have found that if my process is too slow, I can consider porting it to numpy/numba, cython, using pypy or dividing up the work using multiprocessing. multiprocessing is barely more work than using the threading module, and completely avoids the GIL AFAIK.
Well, multiprocessing (the modules) is problematic because not everything can be pickled. If you are working with simple functions this can be fine, but often we use third party libraries that use lambdas (that can't be pickled). Often you don't know why something can't be pickled.

What STM (as I understand from blog posts from the pypy team) is that it provides real threading without having to worry about pickling.

And yes, it only helps if you have multithreaded programs. However, multithreaded and parallel programs are very relevant today.

I found that PyPy sometimes has unexpected slowdowns. When we were porting from Python to PyPy on some offline processing tools, the most crazy one was building strings via += and sum(arrays,[]), which is much slower than cpython.
There was a good blog post this by Armin. Basically, if you have to concatenate strings, don't do so via += but use "".join([]).
I find that unexpected. Java has had string builder optimization for a long time, and CPython is much much faster in this respect. It's not always easy to use "".join when using a string, so you end up having to build a separate array of strings in some cases. And building arrays isn't always that fast either. And [].join doesn't exist, so summing arrays is always kinda slow.

Anyway, all that is to say: I really like PyPy, and we use it a lot, but those _unexpected_ crazy slowdowns are unfortunate.