Hacker News new | ask | show | jobs
by dalke 4693 days ago
Well, help(scipy.optimize.nonlin.Anderson) has the same problem, but you're right in that that failure mode is rare, and that numpy/scipy has good documentation. However, in the context of a stats library, I think it's okay to point out that scipy.stats has some annoying parts. ;)

In all honesty, I seldom use NumPy and rarely use SciPy, so I can't judge that deeply. I know that when I read their respective code bases I get a bit bewildered by the many "import *" and other oddities. It doesn't feel right to me. I know the reason for most of the choices - to reduce API hierarchy and simplify usability for their expected end-users - but their expectations don't match mine.

So I looked at more of the documentation. I started with scipy/integrate/quadpack.py. The docstring for quad() says, in essence, "this docstring isn't long enough, so call quad_explain() to get more documentation." I've never seen that technique used before. The Python documentation says "see this URL" for those cases.

Again, this is a difference in expectations. I argue that NumPy and Python have different end-users in mind. Which is entirely reasonable - they do! But it means that it's very difficult to simply say "add numpy to part of the standard library."

There's also a level of normalization that I would want should numpy be part of the standard library. For example, do out of range input raise ValueError or RuntimeError? scipy/ndimage/filters.py does both, and I don't understand the distinction between one or the other.

Now, in the larger sense, I know the history. RuntimeError was more common in Python, and used as a catch-all exception type. Its existence in numpy reflects its long heritage. It's hard to change that exception type because programs might depend on it.

But it means that integrating all of numpy into the standard library is not going to work: either it breaks existing numpy-based programs, or the merge inherits a large number of oddities that most Python programmers will not be comfortable with.

1 comments

Actually, I don't think the import * in numpy is anything else than historical artefact. Numpy just happens to be one of the oldest, still widely used python library (considering numpy started as numeric), as you point out. As for import speed, have you considered using lazy import in your script ?

I don't see numpy being integrated in python anytime soon. I don't think it would bring much, and one would have to drop performance enhancement that rely on blas/lapack.

I think installing has improved a lot, and once pip + wheel matures, it should be easy to pip install numpy on windows.

I've asked on the numpy mailing list. The "import * " was a design decision, now irrevocable without breaking existing code.

For examples, from http://mail.scipy.org/pipermail/numpy-discussion/2008-July/0... :

Robert Kern: Your use case isn't so typical and so suffers on the import time end of the balance

Stéfan van der Walt: I.e. most people don't start up NumPy all the time -- they import NumPy, and then do some calculations, which typically take longer than the import time. ... You need fast startup time, but most of our users need quick access to whichever functions they want (and often use from an interactive terminal).

I went back to the topic last year. Currently 25% of the import time is spent building some functions which are then exec'ed. At every single import. I contributed a patch, which has been hanging around for a year. I came back to it last week. I'll be working on an updated patch.

There's also about 7% of the startup time because numpy.testing imports unittest in order to get TestCase, so people can refer to numpy.testing.TestCase. Even though numpy does nothing to TestCase and some of numpy's own unit tests use unittest.TestCase instead. sigh. And there's nothing to be done to improve that case.

Regarding the age - yes, you're right. BTW, parts of PIL started in 1995, making it the oldest widely used package, I think. Do you know of anything older?