Hacker News new | ask | show | jobs
by mvanveen 4693 days ago
numpy has all sorts of awful C bindings which make it less than versatile in environments where you want pure Python. It's great from a performance point of view, but horrible for compatibility.

Google App Engine used to suffer because of this (more specifically, it still only restricts your runtime to pure Python, but now you can import numpy at least). I believe the PyPy folks have also had their own set of struggles with numpy compatibility, although I'm not sure what the state of that is at present.

In any case, I think these compatibility concerns alone make a strong argument for including simple Statistics tooling into the standard library.

1 comments

How does that work for SQLite 3, which _is_ part of Python library?

I would actually prefer to have numpy included before those statistics functions.

OK, that's a pretty good counterexample. Touché. :-p

GAE just avoids it alltogether (except locally, where you have CPython and use it to stub out core services hosted on the cloud runtime). You simply can't import sqlite3 on GAE when running on cloud runtime, nor can you really use it as an external dependency.

I'm not really up on the details, but the PyPy website claims they've gotten around this by implementing a pure Python equivalent of the CPython stdlib library (http://pypy.org/compat.html).

I would put forward that SQLite3 is probably a pretty easy include in most C projects compared to whatever numpy would likely require. That said, I'm not qualified to assess this, being neither a numpy, Python core, or sqlite3 dev.

All of this aside, it's worth mentioning that the entire standard lib includes and depends on some other C-only libraries. So it's not unprecedented. In principle, you'd want the standard lib to have as much pure Python as possible (PyPy kind of takes this to the ultimate extreme from what I can gather), but this isn't always practical (great example of "practicality beats purity" if you ask me).

Speaking of which, if it's cool to have `sqlite3` in the standard lib as part of the included batteries, why not mean and variance and the like? :D

PyPy has its own implementation of the sqlite module based on CFFI, as well as other stdlib modules that wrap C libraries (a lot of the libraries written in C for the sake of performance in CPython are just pure Python in PyPy — CPython normally includes both pure Python and not in those cases). numpy is far more complex because it's a far bigger API than anything like sqlite, though work is progressing on it.
numpy allows you to do some pretty low level stuff (like creating buffer from a raw pointer), which GAE wants to avoid for obvious security issues. Same rationale as why ctypes (parts of the stdlib) is not available.

I have never used numpy on GAE, but I would suspect some of it is not enabled for those reasons.