PyPy doesn't speed up all workloads, sometimes the JIT overhead is just too large to still get a speedup in the end. E.g. the Oil shell runs slower under PyPy: http://www.oilshell.org/blog/2018/03/04.html#toc_13
I should also add: Suppose PyPy was twice as fast as CPython for a given workload, but it also used twice as much memory.
I doubt Google or Dropbox would use it in that case. On large clusters, memory usage probably contributes to the need to buy machines more than CPU usage (CPU utilization can be low; memory utilization is higher).
I've personally rewritten some Python code as a C++ extension module and gotten 5x decrease in memory usage across thousands of machines.
(As far as I understand, this is the typical tradeoff for PyPy: it's faster but uses more memory. I'm happy to hear more detail on this though.)
I'm not so sure about this. The single biggest factor in data centers are cooling costs. So it stands to reason that if your CPU usage drops, so does your cooling costs (or you run double your workloads per unit of cooling).
Memory may be cheap, but it is a limiting factor for a lot of workloads. If your memory consumption goes up by 2x, you need to provision 2x the nodes.
In fact, I've heard that one of the reasons IBM is investing in Swift is because it uses so much less memory compared to jitted Java. Apparently, most of their cloud workloads sit idle most of the time, meaning the number of client VMs/jobs/whatever they can put on a machine is almost entirely determined by memory use.
Also, memory usage to a large extent is performance and is power consumption. A random memory access is still on the order of 50-60ns, in that time you can do several hundred ALU operations. (This is assuming the memory is actually used and not just sitting around). See for example http://www.ists.dartmouth.edu/library/202.pdf
"...computation is essentially free, because it happens “in the cracks” between data fetch and data store; on a clocked processor, there is little difference in energy consumption between performing an arithmetic operation and peforming a no-op." -- Andrew Black in Object-oriented programming: some history, and challenges for the next fifty years
Yup - long running and very repetitive processes are the best fit for PyPy. If you have a slow but short-lived process then PyPy is not going to improve things for you.
This is exactly why PyPy blew both Cannoli and CPython away in the microbenchmarks used for analysis. As I've said elsewhere, the focus was on comparing Cannoli (unoptimized) to Cannoli (optimized) and not a direct comparison to CPython or PyPy. However, the microbenchmarks were running iterations of 1-10 million, giving the JIT plenty of time to find beneficial traces in the PyPy interpreter.
BTW, the test I did where PyPy was slower than CPython ran for a minute or so (IIRC). It wasn't that long lived, but it wasn't like the "instant" invocation you often see with shell scripts either.
I don't think the JIT warmup was the main issue there; I think it was PyPy's lack of ability to optimize certain kinds of code combined with increased memory usage.
I should also add: Suppose PyPy was twice as fast as CPython for a given workload, but it also used twice as much memory.
I doubt Google or Dropbox would use it in that case. On large clusters, memory usage probably contributes to the need to buy machines more than CPU usage (CPU utilization can be low; memory utilization is higher).
I've personally rewritten some Python code as a C++ extension module and gotten 5x decrease in memory usage across thousands of machines.
(As far as I understand, this is the typical tradeoff for PyPy: it's faster but uses more memory. I'm happy to hear more detail on this though.)