Hacker News new | ask | show | jobs
by fuzzyman 5357 days ago
I don't really know what you mean by supporting numpy on pypy "excludes" scipy (other than in the short term). If you mean you assumed that the pypy team would port all of scipy as well as numpy, that seems like an unrealistic expectation for an initial port!

An initial port seems like the only way forward from the point of view of the pypy team. I think it is unrealistic to expect the pypy team to take on the work of changing numpy so that it is more friendly to alternative implementations. I would certainly expect them to be involved in the discussion though.

The article also seems to miss that there is work ongoing to bring pypy support to Cython.

2 comments

Well, the big problem to solve with SciPy is the fact that there is more, respectively, of C, C++, and Fortran in the SciPy codebase than Python (http://www.ohloh.net/p/scipy). Part of why Python has succeeded in scientific computing is integration of legacy codebases (via f2py or C extensions or...). There are probably man-years of work involved in devising a solution-- which by the time it's complete may be basically irrelevant or, worse, fragment the community. I personally think that going down this rabbit hole (porting 10 years of scientific Python libraries to PyPy) would amount to an exercise in vanity rather than producing the kinds of revolutionary changes to array-oriented computing that need to happen soon to deal with the large-scale data processing challenges of the present and future. Having recently used GPUs to speed up statistical inference algorithms by a factor of 50 or more, I am not that motivated by a JIT beating C in some cases (as Travis wrote: "C speed is the wrong target"). Many in the SciPy community are convinced that NumPy will not provide the computational foundation that we need going forward, and they are going to step up and start building the next generation NumPy (or whatever it's going to be called). We'd rather have more of the smartest computer scientists in the Python community focused on this problem (building more sophisticated data processing pipelines for use in Python) than on speeding up the Python code that by my estimation doesn't matter that much.
Have you looked at Theano ( http://deeplearning.net/software/theano/ ) ? It is a Python-based JIT for GPUs. Using Python you can build the computation pipeline symbolically, and the formulas are automatically converted to GPU code and scheduled as deemed fit (this can be extended to multiple GPUs, and could theoretically scale to an even higher level).

I think this is a promising idea for the future of array-oriented computing, as it can make use of one more level of parallelism / scaling than the current Numpy paradigm, which is limited to one operation at a time and the user providing the ordering of operations.

AFAIUI, what you're missing is that PyPy can (and often does) interface directly to C libraries, from RPython. So the prospect of re-implementing those specialized codebases isn't a real issue: only the CPython-API based wrappers would need re-implementing. I believe those are a small part of the total code you mention.
But those packages do not just depend on C libraries. They also depend on the numpy C API. If emulating the C API of CPython is too much non-fun work, I would expect the same to be true for numpy C API.
'excludes' scipy in that the numpy C API won't be available so there's a massive hill to climb (as best I understand it) to support most of scipy. Re. getting the pypy team to do the whole port - agree that'd be entirely unrealistic! Re. initial port - since there is more than one way forwards and I want to financially support the project (having pledged £600 earlier), I'd like to understand the risks and benefits in the options.