| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by leephillips 1800 days ago

If you are doing very high-performance numerical work, your choices¹ are Fortran, C, C++, or Julia. Julia is way more fun to program in than the other choices. Also, it has some properties² that make code re-use and re-combination especially easy.

1 https://www.hpcwire.com/off-the-wire/julia-joins-petaflop-cl...

2 https://arstechnica.com/science/2020/10/the-unreasonable-eff...

2 comments

notafraudster 1800 days ago

What's the argument against using R and dropping into RCpp for very limited tasks? I (helped) write a very widely used R modelling package and while I wasn't doing anything on the numerical side, we seemed to get great performance from this approach -- and workflow-wise it wasn't too dissimilar to 25 years ago where I had to occasionally drop in X86 assembly to speed up C code!

(Not a hater of Julia at all, very much think it's a cool language and an increasingly vibrant ecosystem and have been consistently impressed when Julia devs have spoke at events I've attended)

krastanov 1800 days ago

Interoperability between libraries that expect your code to be pure R / pure Python. If you use RCpp or Cython or CPython then you lose much of the magic behind the language that enables the cool (but frequently slow) features. My biggest pain point in this situation: you can not use SciPy or Pillow or Cython code with Jax/Pytorch/Tensorflow (except in very limited fashion).

Differential equation solvers that need to take a very custom function that works on fancy R/Python objects is another example of clumsiness in these drop-to-C-for-speed languages. It works and as a performance-nerd I enjoy writing such code, but it is clumsy.

That type interoperability is trivial in Julia.

tylermw 1800 days ago

Once your Rcpp code is compiled, it's almost indistinguishable from base R (when you're calling it). All R functions eventually end up calling R primitives written in C, and Rcpp just simplifies the process of writing and linking C/C++ code into the R interpreter.

The only difficulty with Rcpp-based R packages is you have to ensure the target system can compile the code, which means having a suitable compiler available.

krastanov 1800 days ago

I wonder how much does it differ from python's use of C or Cython (I have only superficial R skills). The prototypical example of why Python's C prevents interoperability is how the introspection needed by Jax or Tensorflow (e.g. for automatic GPU usage or automatic differentiation) fails when working on Scipy functions implemented in C.

For instance, I imagine there is an R library that makes it easy to automatically run R code on a GPU. Can that library also work with Rcpp functions?

oxinabox 1800 days ago

> it's almost indistinguishable from base R (when you're calling it).

I am very surprised by this. Given how R is extremely dynamic. and has things like lazy-evaluation, that you can rewrite before it is called with substitute. Which I am sure some packages are using in scary and beautiful ways.

_Wintermute 1800 days ago

I think the argument is that most R users don't know C++. So Julia avoids the "2 language problem" that you get with modern scientific computing.

tylermw 1800 days ago

Not much an argument at all, if you ask me. There's definitely a benefit to only having to learn a single language (rather than R and C++), but the library/package ecosystem in R is hard to beat; unless you're doing truly bespoke computational work, the number of mature statistical libraries/packages in R is unmatched. Rcpp's syntactic sugar means most slow R bottlenecks can be written in C++ almost verboten, but without the interpreted performance penalty. One of R's best and under-emphasized features is its straightforward foreign-function interface: it's easy to creating bindings to C/C++/Fortran routines (and Rust support is coming along as well).

I've been impressed with Julia, but it's hard to beat 25 years of package development.

ska 1800 days ago

Same argument as python.

In other words, you can (empirically) get a lot done that way, but there is always friction.

slownews45 1800 days ago

One thing I don't like about the two langauge approach - deployment story get's more complicated it seems?

In my case I went to deploy on a musl system and things with the two language just were a pain to get up and running.

Conversely, everything that was native python ran fine in a musl based python container.

Your native python code just moves also nicely between windows / linux / etc

hobofan 1800 days ago

Development story is as complicated as the tooling makes it to be. With good tooling that e.g. minimizes the amount of glue code and/or makes integration of building the native parts easy, the development story doesn't have to much more complicated.

QuadmasterXLII 1800 days ago

I think that's just being clumped in with "Use C++," which he mentioned as an option

galangalalgol 1800 days ago

and the ffi adds a lot of overhead for granular data. Julia just works fast. My only friction has been offline development, which isn't well supported yet.

devoutsalsa 1800 days ago

Doesn’t Python offer this speed in it’s scientific libraries, too? Or is the answer “yes, if you use the libraries are written in Fortran, C, C++, or Julia!”?

ziotom78 1800 days ago

NumPy is not a good comparison, because Julia can produce faster code which takes less memory [1]. The Python library that is closest to Julia's spirit is Numba [2], and in fact I was able to learn Numba in a few hours thanks to my previous exposure to Julia. (It probably helps that they are both based on LLVM, unlike NumPy.)

However, Numba is quite limited because it only works well for mathematical code (it is not able to apply its optimizations to complex objects, like lists of dictionaries), while on the other side Julia's compiler applies its optimizations to everything.

[1] https://discourse.julialang.org/t/comparing-python-julia-and...

[2] https://numba.pydata.org/

SatvikBeri 1800 days ago

There are a few reasons why Julia still tends to be faster than numpy:

* Julia can do loop fusion when broadcasting, while numpy can't, meaning numpy uses a lot more memory during complex operations. (Numba can handle loop fusion, but it's generally much more restrictive.)

* A lot of code in real applications is glue code in Python, which is slow. I've literally found in some applications that <5% of the time was spent in numpy code, despite that being 90% of the code.

That said, if your code is mostly in numba with no pure python glue code (not just numpy), you probably won't see much of a difference.

leephillips 1800 days ago

When those libraries are fast, it is because they are using Numpy routines written in Fortran or C. And you can get a lot done with those libraries, of course. But they’re only fast if your code can be fit into stereotyped vector patterns. As soon as you need to write a loop, you get slow Python performance. Python + Scipy would not be a good choice for writing an ocean circulation or galaxy merger simulation.

EDIT: And last time I checked, Numpy only parallelizes calls to supplied linear algebra routines, and only if you have the right library installed. A simple vector arithmetic operation like a + b will execute on one core only.

spenczar5 1800 days ago

I work in research software for astronomy, and I cannot agree with that. A very large amount of astronomy software is in Python. Numba has gone a long way toward making non-vectorized array operations very fast from Python.

Most people use a ton of numpy and scipy. It turns out that phrasing things as array operations with numpy operators is quite natural in this field, including for things like galaxy merger simulations.

I work, in particular, on asteroid detection and orbit simulation, and it's all pretty much Python.

maxnoe 1800 days ago

Numba essentially does the same as julia, compile to llvm bytecode, in julia, that's a language design decision, in python it is a library.

You can get very far with these approaches I python, but having these at the language level just has more potential for optimization and less friction.

The debugability of numba code is very limited and code coverage does ot work at all.

Having a high level language that has scientific use at its core is just great.

Python has the maturity and community size on its side, but Jul is catching up on that quickly.

spenczar5 1800 days ago

I agree that numba's JITted code needs debuggability improvements. I've been working on getting it to work with Linux's perf(1) for that reason.

The Julia-for-astronomy community is just microscopic right now, so it's hard to find useful libraries. Nothing comes close to, say, Astropy[0].

I'm not a huge fan of the current numpy stack for scientific code. I just don't think anyone should get too carried away and claim that Julia is taking the entire scientific world by storm. I don't know anyone in my department who has even looked at it seriously.

[0] https://www.astropy.org/

leephillips 1800 days ago

I’m aware that there is plenty of serious computation done with these tools. I don’t want to overstate; I merely meant that, for a fresh project, Julia is now a better choice for a large-scale simulation. Note that no combination of any of the faster implementations of Python + Numpy libraries has ever been used at the most demanding level of scientific computation. That has always been Fortran, with some C and C++, and now Julia.

“It turns out that phrasing things as array operations with numpy operators is quite natural in this field”

But if A and B are numpy arrays, then A + B will calculate the elementwise sum on a single core only, correct? It will vectorize, but not parallelize. All large-scale computation is multi-core.

spenczar5 1800 days ago

> Note that no combination of any of the faster implementations of Python + Numpy libraries has ever been used at the most demanding level of scientific computation. That has always been Fortran, with some C and C++, and now Julia.

This still seems like an overstatement, but maybe it depends on what you mean by "most demanding level." I work on systems for the Rubin Observatory, which is going to be the largest astronomical survey by a lot. There's a bunch of C++ certainly, but heaps of Python. For example, catalog simulation (https://www.lsst.org/scientists/simulations/catsim) is pretty much entirely in Python.

Take a look at `lsst/imsim`, for example, from the Dark Energy collaboration at LSST: https://github.com/LSSTDESC/imSim.

Maybe this isn't the "most demanding" but I don't really know why.

> But if A and B are numpy arrays, then A + B will calculate the elementwise sum on a single core only, correct? It will vectorize, but not parallelize.

That's correct, but numba will parallelize the computation for you (https://numba.pydata.org/numba-doc/latest/user/parallel.html). It's pretty common to use numba's parallelization when relevant.

leephillips 1800 days ago

By a large-scale calculation I have in mind something like this: https://arxiv.org/pdf/2006.09368.pdf, which is in your field of astronomy. It uses about a billion dark-matter elements and was run on the Cobra supercomputer at Max Planck, which has about 137,000 CPU cores. It used the AREPO code, which is a C program that uses MPI. If you know of any calculation in this class using Python I would be interested to hear about it. But generally one doesn’t have one’s team write a proposal for time on a national supercomputing center and then, if it is approved, when your 100-hour slot is finally scheduled, upload a Python script to the cluster. But strange things happen.

EDIT: Yes, numba is impressive.

devoutsalsa 1800 days ago

Out of curiosity, how does someone get into the work you’re doing? Do you just kind of fall into it accidentally? Get a PhD in astronomical computing (if that’s a thing)?

Hasnep 1800 days ago

You're right, Python and R are good choices if your goals happen to align with what those libraries are optimised for, but outside of that you normally need to start writing your own C or C++.

ska 1800 days ago

> Or is the answer “yes, if you use the libraries are written in Fortran, C, C++, or Julia!”?

That's basically the answer.