Hacker News new | ask | show | jobs
by bachback 4693 days ago
Nice proposal. I think the problem is numpy itself. If you could just do pip install numeric_package then nobody can complain. I don't quite understand why a package has to depend on LINPACK. I will probably switch to julia-lang, because numpy is (at least for me) not that great to work with.
4 comments

NumPy is a full MAT-LAB for Python, not a simple drop-in statistics library. It has to depend on LINPACK because writing a full linear algebra library that performs well is damn hard and takes several researcher-years. Most serious scientific computing libraries and utilities depend on it, including Julia I think. There is certainly room for simpler libraries for people not seriously into numeric computations, as the document linked well indicates.
Numpy does not depend on LINPACK, only scipy does (numpy only uses blas if you have one installed, it is optional).

The reason why scipy (and Julia BTW) need blas/lapack is because that's the only way to have decent performance and reasonably accurate linear algebra. The alternative is writing your own implementation of something that has been used and debugged for 30 years, which does not seem like a good idea.

This is what I don't understand at all. Imagine somebody in a different area of computing would say: oh, we solved that 30 years ago and now there is no room for improvement at all? why can't this be done at least in C?
There is room for improvement, and it gets improved all the time (e.g. openblas is a recent contender). LAPACK is essentially an API for linear algebra, which is what allowed people to improve implementations and to benefit from them in older programs.

Think of it as the C library of numerical computing.

I'm confused how a library written in Fortran 90 can be called "the C library of numerical computing".
I was comparing LAPACK to the C library: nobody claims it is weird to use the C library, written > 30 years ago, instead of using something 'more modern'. Everybody uses the C library for system/low level programming, the same is true in numerical computing w.r.t. blas/lapack. Numpy/scipy use it, R use it, julia use it, matlab use it, octave use it.
You'll be disappointed to learn Julia depends on BLAS, LAPACK and librmath. That shouldn't dissuade you from trying it, though: is a pretty cool language.
numpy has all sorts of awful C bindings which make it less than versatile in environments where you want pure Python. It's great from a performance point of view, but horrible for compatibility.

Google App Engine used to suffer because of this (more specifically, it still only restricts your runtime to pure Python, but now you can import numpy at least). I believe the PyPy folks have also had their own set of struggles with numpy compatibility, although I'm not sure what the state of that is at present.

In any case, I think these compatibility concerns alone make a strong argument for including simple Statistics tooling into the standard library.

How does that work for SQLite 3, which _is_ part of Python library?

I would actually prefer to have numpy included before those statistics functions.

OK, that's a pretty good counterexample. Touché. :-p

GAE just avoids it alltogether (except locally, where you have CPython and use it to stub out core services hosted on the cloud runtime). You simply can't import sqlite3 on GAE when running on cloud runtime, nor can you really use it as an external dependency.

I'm not really up on the details, but the PyPy website claims they've gotten around this by implementing a pure Python equivalent of the CPython stdlib library (http://pypy.org/compat.html).

I would put forward that SQLite3 is probably a pretty easy include in most C projects compared to whatever numpy would likely require. That said, I'm not qualified to assess this, being neither a numpy, Python core, or sqlite3 dev.

All of this aside, it's worth mentioning that the entire standard lib includes and depends on some other C-only libraries. So it's not unprecedented. In principle, you'd want the standard lib to have as much pure Python as possible (PyPy kind of takes this to the ultimate extreme from what I can gather), but this isn't always practical (great example of "practicality beats purity" if you ask me).

Speaking of which, if it's cool to have `sqlite3` in the standard lib as part of the included batteries, why not mean and variance and the like? :D

PyPy has its own implementation of the sqlite module based on CFFI, as well as other stdlib modules that wrap C libraries (a lot of the libraries written in C for the sake of performance in CPython are just pure Python in PyPy — CPython normally includes both pure Python and not in those cases). numpy is far more complex because it's a far bigger API than anything like sqlite, though work is progressing on it.
numpy allows you to do some pretty low level stuff (like creating buffer from a raw pointer), which GAE wants to avoid for obvious security issues. Same rationale as why ctypes (parts of the stdlib) is not available.

I have never used numpy on GAE, but I would suspect some of it is not enabled for those reasons.