| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by insertion 4242 days ago

I noticed that PyPy does not show any real speed improvement over CPython in these benchmarks: https://www.techempower.com/benchmarks/

If you filter by Python, you can compare some results running on both CPython and PyPy. I would be curious to know what it is about these benchmarks that makes PyPy perform poorly. I would also be interested to see how Nuitka performs.

At the moment I'm also very excited about Pyston from Dropbox: https://github.com/dropbox/pyston

3 comments

Beltiras 4242 days ago

I'm supremely sceptical about those benchmarks. Go take a look at the code being tested. I would welcome a serious look at developmental time versus resource used, using code that is probable in production. I read the code used to test Django. It's not reasonable code.

link

saym 4241 days ago

It's an open sourced benchmarking comparison. If you can improve them, shoot them a pull request!

https://github.com/TechEmpower/FrameworkBenchmarks

link

gamesbrainiac 4242 days ago

I think pypy is of interest because of pypy-stm that attempts to circumvent the GIL. Not only for speed benefits.

link

wyldfire 4241 days ago

Correct me if I'm wrong here, but isn't that only helpful if you have multithreaded python programs? I have found that if my process is too slow, I can consider porting it to numpy/numba, cython, using pypy or dividing up the work using multiprocessing. multiprocessing is barely more work than using the threading module, and completely avoids the GIL AFAIK.

link

gamesbrainiac 4241 days ago

Well, multiprocessing (the modules) is problematic because not everything can be pickled. If you are working with simple functions this can be fine, but often we use third party libraries that use lambdas (that can't be pickled). Often you don't know why something can't be pickled.

What STM (as I understand from blog posts from the pypy team) is that it provides real threading without having to worry about pickling.

And yes, it only helps if you have multithreaded programs. However, multithreaded and parallel programs are very relevant today.

link

ant6n 4242 days ago

I found that PyPy sometimes has unexpected slowdowns. When we were porting from Python to PyPy on some offline processing tools, the most crazy one was building strings via += and sum(arrays,[]), which is much slower than cpython.

link

Fede_V 4241 days ago

There was a good blog post this by Armin. Basically, if you have to concatenate strings, don't do so via += but use "".join([]).

link

ant6n 4240 days ago

I find that unexpected. Java has had string builder optimization for a long time, and CPython is much much faster in this respect. It's not always easy to use "".join when using a string, so you end up having to build a separate array of strings in some cases. And building arrays isn't always that fast either. And [].join doesn't exist, so summing arrays is always kinda slow.

Anyway, all that is to say: I really like PyPy, and we use it a lot, but those _unexpected_ crazy slowdowns are unfortunate.

link