Hacker News new | ask | show | jobs
by djhaskin987 1798 days ago
Can someone please explain to me, a mere mortal, what is the big deal with Julia. Why use it, when there are so many other good languages out there with more community/support? Honest question.
13 comments

If you are doing very high-performance numerical work, your choices¹ are Fortran, C, C++, or Julia. Julia is way more fun to program in than the other choices. Also, it has some properties² that make code re-use and re-combination especially easy.

1 https://www.hpcwire.com/off-the-wire/julia-joins-petaflop-cl...

2 https://arstechnica.com/science/2020/10/the-unreasonable-eff...

What's the argument against using R and dropping into RCpp for very limited tasks? I (helped) write a very widely used R modelling package and while I wasn't doing anything on the numerical side, we seemed to get great performance from this approach -- and workflow-wise it wasn't too dissimilar to 25 years ago where I had to occasionally drop in X86 assembly to speed up C code!

(Not a hater of Julia at all, very much think it's a cool language and an increasingly vibrant ecosystem and have been consistently impressed when Julia devs have spoke at events I've attended)

Interoperability between libraries that expect your code to be pure R / pure Python. If you use RCpp or Cython or CPython then you lose much of the magic behind the language that enables the cool (but frequently slow) features. My biggest pain point in this situation: you can not use SciPy or Pillow or Cython code with Jax/Pytorch/Tensorflow (except in very limited fashion).

Differential equation solvers that need to take a very custom function that works on fancy R/Python objects is another example of clumsiness in these drop-to-C-for-speed languages. It works and as a performance-nerd I enjoy writing such code, but it is clumsy.

That type interoperability is trivial in Julia.

Once your Rcpp code is compiled, it's almost indistinguishable from base R (when you're calling it). All R functions eventually end up calling R primitives written in C, and Rcpp just simplifies the process of writing and linking C/C++ code into the R interpreter.

The only difficulty with Rcpp-based R packages is you have to ensure the target system can compile the code, which means having a suitable compiler available.

I wonder how much does it differ from python's use of C or Cython (I have only superficial R skills). The prototypical example of why Python's C prevents interoperability is how the introspection needed by Jax or Tensorflow (e.g. for automatic GPU usage or automatic differentiation) fails when working on Scipy functions implemented in C.

For instance, I imagine there is an R library that makes it easy to automatically run R code on a GPU. Can that library also work with Rcpp functions?

> it's almost indistinguishable from base R (when you're calling it).

I am very surprised by this. Given how R is extremely dynamic. and has things like lazy-evaluation, that you can rewrite before it is called with substitute. Which I am sure some packages are using in scary and beautiful ways.

I think the argument is that most R users don't know C++. So Julia avoids the "2 language problem" that you get with modern scientific computing.
Not much an argument at all, if you ask me. There's definitely a benefit to only having to learn a single language (rather than R and C++), but the library/package ecosystem in R is hard to beat; unless you're doing truly bespoke computational work, the number of mature statistical libraries/packages in R is unmatched. Rcpp's syntactic sugar means most slow R bottlenecks can be written in C++ almost verboten, but without the interpreted performance penalty. One of R's best and under-emphasized features is its straightforward foreign-function interface: it's easy to creating bindings to C/C++/Fortran routines (and Rust support is coming along as well).

I've been impressed with Julia, but it's hard to beat 25 years of package development.

Same argument as python.

In other words, you can (empirically) get a lot done that way, but there is always friction.

One thing I don't like about the two langauge approach - deployment story get's more complicated it seems?

In my case I went to deploy on a musl system and things with the two language just were a pain to get up and running.

Conversely, everything that was native python ran fine in a musl based python container.

Your native python code just moves also nicely between windows / linux / etc

Development story is as complicated as the tooling makes it to be. With good tooling that e.g. minimizes the amount of glue code and/or makes integration of building the native parts easy, the development story doesn't have to much more complicated.
I think that's just being clumped in with "Use C++," which he mentioned as an option
and the ffi adds a lot of overhead for granular data. Julia just works fast. My only friction has been offline development, which isn't well supported yet.
Doesn’t Python offer this speed in it’s scientific libraries, too? Or is the answer “yes, if you use the libraries are written in Fortran, C, C++, or Julia!”?
NumPy is not a good comparison, because Julia can produce faster code which takes less memory [1]. The Python library that is closest to Julia's spirit is Numba [2], and in fact I was able to learn Numba in a few hours thanks to my previous exposure to Julia. (It probably helps that they are both based on LLVM, unlike NumPy.)

However, Numba is quite limited because it only works well for mathematical code (it is not able to apply its optimizations to complex objects, like lists of dictionaries), while on the other side Julia's compiler applies its optimizations to everything.

[1] https://discourse.julialang.org/t/comparing-python-julia-and...

[2] https://numba.pydata.org/

There are a few reasons why Julia still tends to be faster than numpy:

* Julia can do loop fusion when broadcasting, while numpy can't, meaning numpy uses a lot more memory during complex operations. (Numba can handle loop fusion, but it's generally much more restrictive.)

* A lot of code in real applications is glue code in Python, which is slow. I've literally found in some applications that <5% of the time was spent in numpy code, despite that being 90% of the code.

That said, if your code is mostly in numba with no pure python glue code (not just numpy), you probably won't see much of a difference.

When those libraries are fast, it is because they are using Numpy routines written in Fortran or C. And you can get a lot done with those libraries, of course. But they’re only fast if your code can be fit into stereotyped vector patterns. As soon as you need to write a loop, you get slow Python performance. Python + Scipy would not be a good choice for writing an ocean circulation or galaxy merger simulation.

EDIT: And last time I checked, Numpy only parallelizes calls to supplied linear algebra routines, and only if you have the right library installed. A simple vector arithmetic operation like a + b will execute on one core only.

I work in research software for astronomy, and I cannot agree with that. A very large amount of astronomy software is in Python. Numba has gone a long way toward making non-vectorized array operations very fast from Python.

Most people use a ton of numpy and scipy. It turns out that phrasing things as array operations with numpy operators is quite natural in this field, including for things like galaxy merger simulations.

I work, in particular, on asteroid detection and orbit simulation, and it's all pretty much Python.

Numba essentially does the same as julia, compile to llvm bytecode, in julia, that's a language design decision, in python it is a library.

You can get very far with these approaches I python, but having these at the language level just has more potential for optimization and less friction.

The debugability of numba code is very limited and code coverage does ot work at all.

Having a high level language that has scientific use at its core is just great.

Python has the maturity and community size on its side, but Jul is catching up on that quickly.

I agree that numba's JITted code needs debuggability improvements. I've been working on getting it to work with Linux's perf(1) for that reason.

The Julia-for-astronomy community is just microscopic right now, so it's hard to find useful libraries. Nothing comes close to, say, Astropy[0].

I'm not a huge fan of the current numpy stack for scientific code. I just don't think anyone should get too carried away and claim that Julia is taking the entire scientific world by storm. I don't know anyone in my department who has even looked at it seriously.

[0] https://www.astropy.org/

I’m aware that there is plenty of serious computation done with these tools. I don’t want to overstate; I merely meant that, for a fresh project, Julia is now a better choice for a large-scale simulation. Note that no combination of any of the faster implementations of Python + Numpy libraries has ever been used at the most demanding level of scientific computation. That has always been Fortran, with some C and C++, and now Julia.

“It turns out that phrasing things as array operations with numpy operators is quite natural in this field”

But if A and B are numpy arrays, then A + B will calculate the elementwise sum on a single core only, correct? It will vectorize, but not parallelize. All large-scale computation is multi-core.

> Note that no combination of any of the faster implementations of Python + Numpy libraries has ever been used at the most demanding level of scientific computation. That has always been Fortran, with some C and C++, and now Julia.

This still seems like an overstatement, but maybe it depends on what you mean by "most demanding level." I work on systems for the Rubin Observatory, which is going to be the largest astronomical survey by a lot. There's a bunch of C++ certainly, but heaps of Python. For example, catalog simulation (https://www.lsst.org/scientists/simulations/catsim) is pretty much entirely in Python.

Take a look at `lsst/imsim`, for example, from the Dark Energy collaboration at LSST: https://github.com/LSSTDESC/imSim.

Maybe this isn't the "most demanding" but I don't really know why.

> But if A and B are numpy arrays, then A + B will calculate the elementwise sum on a single core only, correct? It will vectorize, but not parallelize.

That's correct, but numba will parallelize the computation for you (https://numba.pydata.org/numba-doc/latest/user/parallel.html). It's pretty common to use numba's parallelization when relevant.

Out of curiosity, how does someone get into the work you’re doing? Do you just kind of fall into it accidentally? Get a PhD in astronomical computing (if that’s a thing)?
You're right, Python and R are good choices if your goals happen to align with what those libraries are optimised for, but outside of that you normally need to start writing your own C or C++.
> Or is the answer “yes, if you use the libraries are written in Fortran, C, C++, or Julia!”?

That's basically the answer.

There really aren't that many languages out there trying to be on the cutting edge of JIT for scientific computing with a great REPL experience. There are a few areas where developers have to prototype in a language like Python or MATLAB to design their systems, generate test data, and even just plot stuff during debugging then rewrite in C/C++ for speed. It's an enormous time sink that is prone to errors, and leads to terrible SWE culture.

If Julia can provide both the REPL/debugging experience of a language like Python or MATLAB with a fast enough JIT to use in production it would be an enormous boon to productivity and robustness.

There are a few limiting factors but I don't think they're absolute.

I think your question presupposes a lot of assumptions that may not be right. For one, I don't know that Julia is like a "big deal", certainly Python is the big deal in this field and I doubt Julia is looking to displace it wholesale. That said, Julia is a great addition to the scientific computing landscape because of its performance compared to other languages and its use of modern programming features. Python is just really really slow compared to Julia and parallelism in Python is a huge pain. Fortran is really really fast but that comes at a cost of being awkward to use and coming with a great deal of baggage. Julia is fast, feels modern, and has pretty easy parallelism.

Then there's Matlab, Mathematica, and they are also pretty good but they're closed source/proprietary, so their ecosystem is mostly limited and driven by commercial interests. Nothing wrong with that intrinsically and they're all widely used but it's one way Julia differentiates itself, by making the language open and making money through services.

Julia Computing is not a services company. There are commercial products built off of this stack which are the core of Julia Computing. For example, https://pumas.ai/ is a product for pharmacology modeling and simulation, and runs on the JuliaHub cloud platform of Julia Computing. It is already a big deal in the industry, with the quote everyone refers to "Pumas has emerged as our 'go-to' tool for most of our analyses in recent months" from the Director Head of Clinical Pharmacology and Pharmacometrics at Moderna Therepeutics during 2020 (for details, see the full approved press release from the Pumas.ai website). JuliaSim is another major product which is being released soon, along with JuliaSPICE publicly in the pipeline.

But indeed, Julia Computing differentiates itself from something like MATLAB or Mathematica by leveraging a strong open source community on which these products are developed. These products add a lot of the details that are generally lacking in the open source space, such as strong adherents to file formats, regulatory compliance and validation, GUIs, etc. which are required to take such a product from "this guy can use it" to a fully marketable product usable by non-computational scientists. I will elaborate a bit more on this at JuliaCon next week in my talk on the release of JuliaSim.

Wanted to ask if JuliaDB is something that might get more development attention? Or will that remain a community project? (I see it’s been in need of a release for awhile.)
It is not in our current set of major products. That said, informal office discussions mentioned JuliaDB as recently as last week, so it's not forgotten. If there's a demonstrated market, say a need for new high-performance cloud data science tools as part of the pharmaceutical domains we work in, then something like JuliaDB could possibly be revived in the future (of course, this is no guarantee).
In general, the community has discussed reviving the project (or at least the ideas and some of its codebase). Julia computing will also be contributing as part of that revival.
Thank you both for the comments. I believe I remember early on there were some comparisons to kdb+/q. I think there is some pretty great potential with an offering like this (an in-memory database integrated with the language, coupled with solid static storage) from the Julia community going forward. I can envision some use cases in genomics/transcriptomics.
Composability via dispatch-oriented programming, e.g. [1]

It also pretty much solved my version of the two-language problem, but that means different things to different people so ymmv.

[1] https://www.youtube.com/watch?v=kc9HwsxE1OY

>Why use it, when there are so many other good languages out there with more community/support? Honest question.

Such a question seems sort of in bad faith (or loaded), since the selling points of Julia have been hammered time and again on HN and elsewhere, and are prominent on its website. It's a 1 minute search to find them, and if someone is already aware that there's this thing called Julia to the point that they think it's made to be "a big deal", they surely have seen them.

So, what could the answer to the question above be? Some objective number that shows Julia is 25.6% better than Java or Rust or R or whatever?

But first, who said it's a "big deal"? It's just a language that has some development action, seems some adoption, and secured a modest fundng for its company. That's not some earth shattering hype (if you want to see that, try to read about when Java was introduced. Or, to a much lesser degree, Ada, for that matter).

You use a language because you've evaluated it for your needs and agree with the benefits and tradeoffs.

Julia is high level and at the same time very fast for numerical computing allowing you to keep a clean codebase that's not a mix of C, C++, Fortran and your "real" language, while still getting most of the speed and easy parallelization. It also has special focus on support for that, for data science, statistics, and science in general. It's also well designed.

On the other hand, it has slow startup/load times, incomplete documentation, smaller ecosystem, and several smaller usability issues.

While you have a valid perspective, the HN guidelines [1] do specifically ask us to assume good faith.

[1] https://news.ycombinator.com/newsguidelines.html

You assume they've seen the posts about Julia on HN. If you're not interested in PL it's fair to assume they might not click on those posts.
Short answer is that it is (imo) by far the best language for writing generic and fast math. Multiple dispatch allows you to write math using normal notation and not have to jump through hoops to do so.
can you show a simple example of that? I tend to see multiple dispatch as a mental burden, (e.g.: when I see a function call, where will it be dispatched? the answer dependa on the types that I'm juggling, that may not even be visible at that point...)
The key to make multiple dispatch work well is that you shouldn't have to think about what method gets called. For this to work out, you need to make sure that you only add a method to a function if it does the "same thing" (so don't use >> for printing for example). To Dr the benefit of this in action, consider that in Julia 1im+2//3 (the syntax for sqrt(-1)+2 thirds) works and gives you a complex rational number (2//3+1//1 im). To get this behavior in most other languages, you would have to write special code for complex numbers with rational coefficients, but in Julia this just works since complex and rational numbers can be constructed using anything that has arithmetic defined. This goes all the way up the stack in Julia. You can put these numbers in a matrix, and matrix multiplication will just work, you can plot functions using these numbers, you can do gpu computation with them etc. All of this works (and is fast) because multiple dispatch can pick the right method based on all the argument types.
> The key to make multiple dispatch work well is that you shouldn't have to think about what method gets called. For this to work out, you need to make sure that you only add a method to a function if it does the "same thing" (so don't use >> for printing for example).

Thanks for writing this. I think it is an important concept for getting started with Julia. When I tried Julia, I was initially confused and concerned about the subtyping hierarchy, which as far as I understand is undocumented. "Apart from a partial description in prose in Bezanson [2015], the only specification of subtyping is 2,800 lines of heavily optimized, undocumented C code." [0]. Assurance that users can safely ignore the subtyping hierarchy if we maintain semantic equivalence between methods, and that this actually works out in practice, makes it easier to commit to using the language.

[0] https://dl.acm.org/doi/10.1145/3276483

This is an important point: abstractions like generic functions are only as good as how much people respect them, and enforcement of that is largely cultural. You can add technological levels of enforcement, but those are limited at best because people are brilliantly devious and can find ways to break any technological enforcement if they set their minds to it; the key to preventing that is convincing them to set their minds to respecting abstractions instead.

For a negative example of this, C++ introduced function overloading — which is a static and therefore broken version of multiple dispatch (that calls the wrong “methods” based only on static type information). They then immediately decided to abuse the bitshift operators for I/O — in the standard library, no less. (There are other abuses of overloading but this is the most famous one.) So that didn’t exactly create a culture where people respect the semantic meaning of overloaded functions. As a result, C++ has given function overloading a really horrible reputation. Partly because it is likely to be not actually so what you want because it’s static rather than based on the actual types of arguments, but even more so because there’s no culture of semantic consistency in C++, starting from the standard library itself — you cannot trust anyone to respect the meaning of anything, not even the language authors.

In Julia, on the other hand, we’ve always been very strict about this: don’t add a method to a function unless it means the same thing. We would never dream of using bitshift operators to do I/O. Since meaning is the level where human thinking operates, this makes it reasonable to write generic code that works the way you meant it to: you can call `+` on two things and just trust that whoever has implemented `+` for those objects didn’t decide to make it append a value to an array or something wonky like that. Not that people haven’t proposed that kind of thing, but because of the culture, it gets shot down in the standard library and elsewhere in the ecosystem.

But yeah, it’s hard to see how this would happen because there is nothing technical preventing the same kinds of abuses that are rampant in C++, it’s just the invisible but extremely force of culture.

> We would never dream of using bitshift operators to do I/O

Or the sum operator for string concatenation, which is the epitome of non-commutative operation.

I like your point of view, but still I'm skeptical of function and especially operator overloading. Shouldn't these semantic constraints that you mention be enforced by the language? For example, the language does not let you overload + for a non-commutative operation, and so on.

> make sure that you only add a method to a function if it does the "same thing"

But this only concerns when I'm writing the code myself. If I read some code and I see a few nested function calls, there's a combinatorial explosion of possible types that gives me vertigo.

> complex and rational numbers can be constructed using anything that has arithmetic defined.

seriously? this does not seem right, it cannot be like that. If I build a complex number out of complex numbers, I expect a regular complex number, not a "doubly complex" number with complex coefficients, akin to a quaternion. Or do you? There is surely some hidden dirty magic to avoid that case.

There's no hidden dirty magic, but you're right: Complex numbers require `Real` components and Rationals require `Integer` numerators and denominators. Both `Real` and `Integer` are simply abstract types in the numeric type tree, but you're free to make your own. You can see how this works directly in their docstrings — it's that type parameter in curly braces that defines it, and `<:` means subtype:

    help?> Complex
    search: Complex complex ComplexF64 ComplexF32 ComplexF16 completecases
    
      Complex{T<:Real} <: Number
    
      Complex number type with real and imaginary part of type T.
One other really good example is BLAS. Since it is a C/Fortran library you have 26 different functions for matrix multiply depending on the type of matrix (and at least that many for matrix vector multiply). In Julia, you just have * which will do the right thing no matter what. In languages without multiple dispatch, any code that wants to do a matrix multiply will either have to be copied and pasted 25 times for each input type, or will have 50 lines of code to branch on the input type. Multiple dispatch makes all of that pain go away. You just use * and you get the most specific method available.
We had three big data pipelines written in numpy that we'd spent a lot of time optimizing. Rewriting them in Julia, we were able to get an 8x (serial -> serial), 14x (parallel -> serial), and 28x (parallel -> parallel) speedups respectively – and with clearer, more concise code. The difference is huge.
did you end up using package compiler as well?
No, the pipelines are long enough that compilation time isn't a big issue.
It’s a solid combination of performance, easy syntax and flexible environment. A big drawback of Python is that any performant code is actually written in a lower level language, with foreign function calls.

That’s not to say that there are no disadvantages to Julia. I personally see Julia as a beefed up, new and improved R.

I am using Julia extensively since 2013, and I can say that it's awesome! But don't try to use it if you're looking for a general-purpose scripting language: Python is far better suited for this. Similarly, if you want to produce standalone executables, C++, Rust, Go or Nim are better.

However, Julia is perfect if you write mathematical/physical/engineering simulations and data analysis codes, which is my typical use case. Its support for multiple dispatch and custom operators lets you to write very efficient code without sacrificing readability, which is a big plus. Support for HPC computing is very good too.

>Python is far better suited for this. Similarly, if you want to produce standalone executables, C++, Rust, Go or Nim are better.

That's the case now, because Julia made a design decision to focus on extreme composability, dynamism, generic codegen etc which involved compiler tradeoffs...but it's not inherent to the langauge.

For scripting, interpreted Julia is coming. For executables, small binary compilation is as well...particularly bullish on this given the new funding

> For scripting, interpreted Julia is coming.

Citation for this? Julia has had a built-in interpretted since 1.0, in 2017 use `--compile=min`, or `--compile=none` to make use of it. And JuliaInterpretter.jl has been working since 2018. Both are very slow -- slower than adding in the compile time for most applications. As I understand it, this is because a number of things in how the language works are predicated on having a optimizing JIT compiler. As is how the standard library and basically all packages are written.

Julia is going to over time become nicer for scripting, just because of various improvements. In particular, I put more hope on caching native code than on any new interpreter.

Yeah, you are right, these limitations are not much of a matter of the language itself.
From what I understand, Julia is dynamically typed and similar to a scripting language like Python or Ruby, but is also compiled, so it has performance similar to C/C++ (it's also written in itself). It also has built-in support for parallelism, multi threading, GPU compute, and distributed compute. I'm sure others can provide more insight. I've only dabbled in it and haven't used it extensively in any sense of the word.
My understanding is that it runs faster than native python and R. That said with Numba and other libraries, see no point.
Numba is great for pure functions on primitive types but it breaks down when you need to pass objects around. PyPy is fantastic for single-threaded applications but doesn't play nicely with multiprocessing or distributed computing IME. Numpy helps for stuff you can vectorize, but there's a lot of stuff you can't (or can but shouldn't); it also brings lots of minor inconveniences by virtue of not being a native type - e.g., the code to JSON serialize a `Union[Sequence[float], np.ndarray]` isn't exactly Pythonic.
Also, numpy has about a 100x overhead door small arrays (10 or fewer elements).
> runs faster than native python and R

That's a bit of an understatement. It's about as fast as C and Rust (ignoring JIT compilation time).

https://julialang.org/benchmarks/

It's easier to write Julia code than to deal with Numba tbh and the ecosystem around Julia makes the code composable which is often not the case if you write Numba code and have to deal with other libraries.
Composability, speed, static analysis, type system, abstractions, user defined compiler passes, metaprogramming, ffi, soon static compilation, differentiability and more create an effect that far exceeds numba
Basically, the syntax is similar to matlap as well as lots of the same features around functional and variable deceleration as python. The best thing for sure is the native support for parallel processing. Where python is single tread.
kinda tries to make coding like python but running like fortran (without having to resort side-batteries like numpy/scipy)

designers seems to have a good amount of PLT knowledge and made good foundations