Hacker News new | ask | show | jobs
by leephillips 1799 days ago
When those libraries are fast, it is because they are using Numpy routines written in Fortran or C. And you can get a lot done with those libraries, of course. But they’re only fast if your code can be fit into stereotyped vector patterns. As soon as you need to write a loop, you get slow Python performance. Python + Scipy would not be a good choice for writing an ocean circulation or galaxy merger simulation.

EDIT: And last time I checked, Numpy only parallelizes calls to supplied linear algebra routines, and only if you have the right library installed. A simple vector arithmetic operation like a + b will execute on one core only.

1 comments

I work in research software for astronomy, and I cannot agree with that. A very large amount of astronomy software is in Python. Numba has gone a long way toward making non-vectorized array operations very fast from Python.

Most people use a ton of numpy and scipy. It turns out that phrasing things as array operations with numpy operators is quite natural in this field, including for things like galaxy merger simulations.

I work, in particular, on asteroid detection and orbit simulation, and it's all pretty much Python.

Numba essentially does the same as julia, compile to llvm bytecode, in julia, that's a language design decision, in python it is a library.

You can get very far with these approaches I python, but having these at the language level just has more potential for optimization and less friction.

The debugability of numba code is very limited and code coverage does ot work at all.

Having a high level language that has scientific use at its core is just great.

Python has the maturity and community size on its side, but Jul is catching up on that quickly.

I agree that numba's JITted code needs debuggability improvements. I've been working on getting it to work with Linux's perf(1) for that reason.

The Julia-for-astronomy community is just microscopic right now, so it's hard to find useful libraries. Nothing comes close to, say, Astropy[0].

I'm not a huge fan of the current numpy stack for scientific code. I just don't think anyone should get too carried away and claim that Julia is taking the entire scientific world by storm. I don't know anyone in my department who has even looked at it seriously.

[0] https://www.astropy.org/

I’m aware that there is plenty of serious computation done with these tools. I don’t want to overstate; I merely meant that, for a fresh project, Julia is now a better choice for a large-scale simulation. Note that no combination of any of the faster implementations of Python + Numpy libraries has ever been used at the most demanding level of scientific computation. That has always been Fortran, with some C and C++, and now Julia.

“It turns out that phrasing things as array operations with numpy operators is quite natural in this field”

But if A and B are numpy arrays, then A + B will calculate the elementwise sum on a single core only, correct? It will vectorize, but not parallelize. All large-scale computation is multi-core.

> Note that no combination of any of the faster implementations of Python + Numpy libraries has ever been used at the most demanding level of scientific computation. That has always been Fortran, with some C and C++, and now Julia.

This still seems like an overstatement, but maybe it depends on what you mean by "most demanding level." I work on systems for the Rubin Observatory, which is going to be the largest astronomical survey by a lot. There's a bunch of C++ certainly, but heaps of Python. For example, catalog simulation (https://www.lsst.org/scientists/simulations/catsim) is pretty much entirely in Python.

Take a look at `lsst/imsim`, for example, from the Dark Energy collaboration at LSST: https://github.com/LSSTDESC/imSim.

Maybe this isn't the "most demanding" but I don't really know why.

> But if A and B are numpy arrays, then A + B will calculate the elementwise sum on a single core only, correct? It will vectorize, but not parallelize.

That's correct, but numba will parallelize the computation for you (https://numba.pydata.org/numba-doc/latest/user/parallel.html). It's pretty common to use numba's parallelization when relevant.

By a large-scale calculation I have in mind something like this: https://arxiv.org/pdf/2006.09368.pdf, which is in your field of astronomy. It uses about a billion dark-matter elements and was run on the Cobra supercomputer at Max Planck, which has about 137,000 CPU cores. It used the AREPO code, which is a C program that uses MPI. If you know of any calculation in this class using Python I would be interested to hear about it. But generally one doesn’t have one’s team write a proposal for time on a national supercomputing center and then, if it is approved, when your 100-hour slot is finally scheduled, upload a Python script to the cluster. But strange things happen.

EDIT: Yes, numba is impressive.

Out of curiosity, how does someone get into the work you’re doing? Do you just kind of fall into it accidentally? Get a PhD in astronomical computing (if that’s a thing)?