Hacker News new | ask | show | jobs
by dlphn___xyz 2540 days ago
whats the selling point with Julia? why would i use it over something like R?
2 comments

In R, most of the high performance code isn't written in R, it's written in Fortran or C or C++ (R has really good C++ integration via Rcpp). Python has something similar. The value prop of Julia is supposed to be that you have a language flexible enough to do the high-level stuff you'd normally do in R/Python, plus the ability to write high-performance code without having to drop into another language.

I remain skeptical that this solves a lot of real-world problems (I know a lot of users of R/Python who never need to resort to writing their own C/C++ code), but that's the sales pitch.

I think if you're just plugging together reasonably "vanilla" components from python / R libraries, and only using vectorised operations, those languages are fine and you can get away with using vectorised libraries wrapping C++.

The moment Julia shines is when your workloads can't be phrased by stringing together the limited set of vectorised verbs that python / r libraries give you: this is anything stateful and loopy like reinforcement learning, systematic trading, monte carlo simulations etc. It's also useful if you really care about performance and are doing "vanilla" computations at a truly large scale. If you want to avoid copying memory (i.e. doing vectorised operations), or want to tightly optimise / fused some numerical operations, it's great.

The other issue with python / r wrapping c++ libraries is that different libraries will generally not play well together (without coming out into python / r space, and doing a lot of copying / allocation). This tends to encourage large monolithic c/++ codebases like numpy and pandas, that are pretty impenetrable and difficult to extend / modify.

One more advantage to these libraries being written in Julia is that, if they are almost do what you need but not quite, it's often pretty easy to reach inside and patch the function which needs changing. You already speak the language and don't need to stop the world to do this. The barrier to doing this to (say) numpy is just much higher.
It's supposed to be faster
It's a bit more nuanced than that. It's "as fast" without having to write any C.

I tried to recreate something like AlphaGo in Python using Keras, I never got the learning to work (probably because I was impatient and training on a laptop CPU), but a lot of the CPU time was simply being spent on manipulating the board state.

So I ported my "Board" object to Rust, and it was a lot faster. Things like counting liberties or removing dead stones were a lot faster, which was important.

Then I rewrote the whole thing in Julia and it was just as fast as my Python / Rust combo.

So I saw for myself that Julia does solve the two language problem. It is as pleasant to write as Python (and I like it better actually), and performed as well as Rust, based on my informal benchmarks.

What's the nuance? It's much faster?
Since neither of the others mentioned it. The nuance is the type system + multimethods. It's a gradually typed system that fully specializes code where it can (aided by the expressive power of multi-methods), and hence with some careful (or overkill) placement of types (and multimethods) it's easy to get large performance boosts with minor edits to one's code (rather than porting the whole thing to C which is the python strategy). But the first pass can still be like one is writing python code, with no typing at all (and sometimes you will get lucky, or be very smart, and that will be fast through specialization without extra work).

As a brief demonstration I can write:

foo(a, b, c) = (a + b) * c

And when I call it on integers, it emits only the necessary integer assembly, and when I call it on floats only the necessary float assembly, and when I broadcast it across vectors it emits SSE assembly. It's only when it can't prove the incoming types that it emits any sort of dynamic type code. It's also possible for the calling function to be ignorant of the types too, and so on, until a user decides to pass in an integer or a float, and all of the code is specialized to be as fast as possible.

I wouldn’t say that Julia is gradually typed in the same way that Cython or Numba is. In my experience you usually improve performance by ensuring that your functions handle different types generically. One example of that is making sure the compiler can infer the types of all your variables to something more specialized than the Any type. Another example is being careful to avoid accidental type changes by e.g. summing a Float32 with a Float64 literal.

As I’ve learned the language it’s become pretty easy to avoid those pitfalls even on initial implementations. That said, providing types in function signatures is still very useful for multiple dispatch and providing a more usable API in libraries.

The nuance is that for someone who mainly just calls functions from packages, they probably won’t notice any real speed difference since performance sensitive packages in python and R are typically written in C or C++. Additionally, there are various tools like Numba for accelerating Python code that will make certain restricted subsets of Python just as fast (or sometimes faster) than Julia.

However, as soon as you try to do something a bit more complicated then you’ll notice the speed and flexibility differences.

I dunno about this. At work, we have an exhaustive model fitting procedure that takes a looooonnnng time.

I prototyped a quick julia implementation of a simple glm (almost identical code in Julia and R), and the julia code was approximately 10-20 times faster depending on the model.

This is definitely worth looking at (mind you, the costs of redevelopment of our code in Julia is probably prohibitive). That being said, this would encourage me to call out to julia from R for some of my more computationally heavy workloads.

Julia code gets compiled native via LLVM so it is about as fast as other natively compiled languages.
Not making any claims about Julia, but “gets compiled native via LLVM” doesn’t imply ”is about as fast as other natively compiled languages”

For example, a straightforward Python-to-LLVM compiler would generate code with every variable being a PyObject (https://docs.python.org/3/c-api/structures.html) instance, and “switch(obj.ob_type)” equivalents that would require a “sufficiently advanced compiler” to get to equivalent speed as, say, C.