Hacker News new | ask | show | jobs
by sandGorgon 2953 days ago
I still don't understand why Pypy hasn't been adopted by Google or Dropbox (the standard bearers of the Python ecosystem) as a forward looking investment. It is constantly underfunded (https://pypy.org/py3donate.html) and given the potential for the work that's happening, I don't understand why these guys don't write cheques for a few hundred k.
6 comments

After I ran the experimental evaluation, I had similar thoughts. If PyPy ever matches the current version of CPython I'm not sure why one wouldn't use PyPy over CPython. The biggest hurdle is matching support for popular libraries like NumPy, Tensorflow, Pandas, Scipy etc. I know they're working on supporting these, it's definitely a lot of work to do, easier said than done.
PyPy doesn't speed up all workloads, sometimes the JIT overhead is just too large to still get a speedup in the end. E.g. the Oil shell runs slower under PyPy: http://www.oilshell.org/blog/2018/03/04.html#toc_13
(author here) Thanks for the reference :)

I should also add: Suppose PyPy was twice as fast as CPython for a given workload, but it also used twice as much memory.

I doubt Google or Dropbox would use it in that case. On large clusters, memory usage probably contributes to the need to buy machines more than CPU usage (CPU utilization can be low; memory utilization is higher).

I've personally rewritten some Python code as a C++ extension module and gotten 5x decrease in memory usage across thousands of machines.

(As far as I understand, this is the typical tradeoff for PyPy: it's faster but uses more memory. I'm happy to hear more detail on this though.)

I'm not so sure about this. The single biggest factor in data centers are cooling costs. So it stands to reason that if your CPU usage drops, so does your cooling costs (or you run double your workloads per unit of cooling).

Memory is cheap OTOH.

Memory may be cheap, but it is a limiting factor for a lot of workloads. If your memory consumption goes up by 2x, you need to provision 2x the nodes.

In fact, I've heard that one of the reasons IBM is investing in Swift is because it uses so much less memory compared to jitted Java. Apparently, most of their cloud workloads sit idle most of the time, meaning the number of client VMs/jobs/whatever they can put on a machine is almost entirely determined by memory use.

Also, memory usage to a large extent is performance and is power consumption. A random memory access is still on the order of 50-60ns, in that time you can do several hundred ALU operations. (This is assuming the memory is actually used and not just sitting around). See for example http://www.ists.dartmouth.edu/library/202.pdf

"...computation is essentially free, because it happens “in the cracks” between data fetch and data store; on a clocked processor, there is little difference in energy consumption between performing an arithmetic operation and peforming a no-op." -- Andrew Black in Object-oriented programming: some history, and challenges for the next fifty years

Yup - long running and very repetitive processes are the best fit for PyPy. If you have a slow but short-lived process then PyPy is not going to improve things for you.
This is exactly why PyPy blew both Cannoli and CPython away in the microbenchmarks used for analysis. As I've said elsewhere, the focus was on comparing Cannoli (unoptimized) to Cannoli (optimized) and not a direct comparison to CPython or PyPy. However, the microbenchmarks were running iterations of 1-10 million, giving the JIT plenty of time to find beneficial traces in the PyPy interpreter.
BTW, the test I did where PyPy was slower than CPython ran for a minute or so (IIRC). It wasn't that long lived, but it wasn't like the "instant" invocation you often see with shell scripts either.

I don't think the JIT warmup was the main issue there; I think it was PyPy's lack of ability to optimize certain kinds of code combined with increased memory usage.

I am hoping Facebook to fund PyPy given that Instagram runs on Python.

It seems Google and Dropbox are not interested. Google is working on Grumpy, Dropbox worked on Pyston.

I thought so too. But Google has a history of funding multiple initiatives in parallel. Grumpy is a very different effort to translate Python to Go - but Pypy is more or less drop in.

Pyston also seems dead - nobody has committed in a year. You have to give kudos to the Pypy devs. The level of passion contributing countless hours to an underfunded, high impact project must be incredible.

It seems that Instagram is making tweaks here and there, as reported in this weeks LWN for example:

https://lwn.net/SubscriberLink/754163/a38214c50e7b3ece/

To be honest I barely use python, but when I read lwn, etc, I get the impression multiple people are tryign to solve multiple problems with python (from the GiL onwards).

It seems like a hard problem, given the dynamic nature of the language and the unwillingness to break the C API.

thanks for linking to that!

> Thomas Wouters asked if he had looked at PyPy. Shapiro said the company had, but there was only a modest bump in performance for its workload. He was not the one who did the work, however. Wouters noted that PyPy is more than "Python with a JIT" because it has its own data model as well.

This is interesting. How much was a "modest bump" in performance ? And why was the bump in performance not a reason for adoption ? Does Pypy break a lot of stuff ?

Oh and this

> Some of what Shapiro presented did not sit well with Guido van Rossum, who loudly objected to Shapiro's tone, which was condescending, he said. Van Rossum thought that Shapiro did not really mean to be condescending, but that was how he came across and it was not appreciated. The presentation made it sound like Shapiro and his colleagues were the first to think about these issues and to recognize the inefficiencies, but that is not the case. Shapiro was momentarily flustered by the outburst and its vehemence, but got back on track fairly quickly.

> Shapiro's overall point was that he felt Python sacrificed its performance for flexibility and generality, but the dynamic features are typically not used heavily in performance-sensitive production workloads. So he believes it makes sense to optimize for the common case at the expense of the less-common cases. But Shapiro may not be aware that the Python core developers have often preferred simpler, more understandable code that is easier to read and follow, over more complex algorithms and data structures in the interpreter. Some performance may well have been sacrificed for readability.

Grumpy development has stopped, at least on the grumpy github repo.
On Google's case it appears they see more worthwhile to migrate their Python code into Go and Swift than improving Python runtimes.

Remember Unladen Swallow?

Unladen swallow has never been an official attempt to solve that. It was the initiative of one guy (http://qinsb.blogspot.fr/2011/03/unladen-swallow-retrospecti...), during an internship, with basically zero support. And so google never even consdered as a serious solution.

Beside, when they took the Go decision, they were still using Python 2.4. It had poor unicode support, bad async io, no multiprocessing pool, aweful packaging and deployment story and the project of a 3.X breaking everything.

It made a lot of sense, business wise. If you have to rewrite your code base, better rewrite it in a language you control (no PSF to fight against), specifically design for your workfload, and fixing all those quirks, plus running faster and eating each memory.

Today's Python unicode support is top notch, asyncio ensure easy networking code, you can use multi-core easily, deployment is pretty much a solved problem and 3.X is running pretty much for 90% of people. You even have a hook to plug in a JIT inside CPython, waiting to be used, and Type hints to facilitate the handling of huge code bases.

Their decision would probably have been different then, although that wouldn't have made Python any easier to speed up. But if a few smart people managed to make it work, by rewritting python in python, no less, I'm sure a dedicated google team would have done great.

They did it for the slug that was JS after all. But they didn't have the luxury to be able to change the language for that. So they pourred millions into it. Not one guy during an internship.

What kind of Python code are they migrating to Swift? On the server? If so, I'm a bit surprised they're already using Swift on the server. It's a cool language, but for Linux servers it seems really young.
Wouldn't the biggest issue still be that C modules either don't work or are slow? I'd imagine it's much better to be able to solve performance bottlenecks by using cython/c than to have overall faster runtime, but no option to go further.
Engineers usually have more fun rewriting everything in “that new shiny tool that people is speaking about”.

Managers enjoy avoiding conflicts.

Very rarely someone in a position of power will point out to this kind of solution, which anyway is going to be against wishes of many employees.

The Python ecosystem in general is severely underfunded despite all big players using it extensively.

I think one reason is that the community is doing too good of a job. The language is pretty sane, it solves most problems right, the libs and docs are good, and the general direction thinks take is reasonable. And it's free not only as beer and freedom, but also free from business influences. The PSF is really giving away pretty much everything.

Everybody contribues a little (we have the brett canon team from ms, the guido team from dropbox, the alexi martelli team from google, mozilla even donated for pypy, etc). But it's nothing massive. Nobody said "ok here is 10 millions euros, solve the packaging problem".

Compare to JS: the language started as slow, with terrible design, and no consensus on the direction to take. So eventually, people (Google first) pourred a load of money to it until it became usable, and they had a cleaner leadership. They had huge problem to solve on the ever expending market that is the web plateform. Of course JS as the unfair advantage of a captive audience and total monopoly on its field.

Remember Unladen shallow ? "Google" attempt to JIT Python ? It was just one guy during his internship (http://qinsb.blogspot.fr/2011/03/unladen-swallow-retrospecti...).

And look at the budget the PSF had in 2011 to help the community: http://pyfound.blogspot.fr/2012/01/psf-grants-over-37000-to-... I mean, even today they have to go though so many shenanigans for barely 20k (https://www.python.org/psf/donations/2018-q2-drive/).

But at the same time you hear people complaining they yet can't migrate to Python 3 because they have millions of lines of Python. You hear of them when they want to extend the support for free, but never to support the community.

It's ridiculous.

Also compare to PHP: the creators made a business out of it, plain and simple.

Compare to Java/C#/Go: it's owned by huge players that have a lot of money engaged.

Python really needs a sugar daddy so that we can tackle the few items remaining on the list:

- integrated steps to make an exe/rpm/deb/.app

- JIT that works everywhere

- mobile dev

- multi-core with fast and safe memory sharing

There are projects for that (nuikta, pyjion, kivi, etc), but they all lack of human power, money and hence integration, perfs, features, etc.

You need a simple way to code some GUI, make it work on mobile or desktop, turn it into and exe and distribute it.

You need a simple way to say "this is a long running process, JIT the hell out of it".

> So eventually, people (Google first) pourred a load of money to it until it became usable, and they had a cleaner leadership.

Google was not the first company to pour lots of money into JavaScript. Remember the first browser war?

Yes, and JS was not the center of it at all. Support was inconsistent, it was slow, leaked memory...

Actually, most actors tried to find a way to replace JS with something in house (ActionScript, ActiveX, Java applets) so they could dominate the market.

Most scripting languages were slow at that point, and memory leaks are still rampant in popular browser engines. Microsoft had an enormous team working on "JScript". I wouldn't be surprised if the size of that team significantly exceeded the size of the V8 team in the early days.

Netscape and later Mozilla were also heavily investing in JavaScript long before Google came on the scene.