Hacker News new | ask | show | jobs
by jernfrost 3050 days ago
When you write code in a C like fashion it can often get as fast or faster than C. Of course it has the disadvatnage of all jit languages that it will likely have slower startup. But when you crunch big chunks of data it should be able to compete with C.

Actually want to qualify this a bit. Nothing gets automatically fast by just picking a fast language. That Julia is close to C in performance mainly means it is possible to achieve this through optimization without too much effort. However you still need to have some idea of what makes something slow or fast in Julia.

Personally I find it very easy to optimize Julia code because you can quickly see how the JIT transforms an individual function and what you may need to change.

1 comments

Ok thanks. Lately been using Cython with its near-C speed and Python convenience. Just quickly had a look at Julia wiki page, seems like a fast Python, is that fair?
Python doesn't have multiple dispatch though. It's central to making Julia's type system fast. I.e., it's not just loops that are fast, but also code that uses Julia-defined data structures are fast. Additionally, you can write generic code and have it automatically compile fast code for large classes of types. Also, the whole Base library is written in Julia, mostly generically, so this will be compatible with your user-defined data structures as well.

Then finally, the kicker is that Julia, especially on v0.7, doesn't optimize functions separately: it optimizes them together. It will inline small functions into others, perform interprocedural optimizations and utilize compilation-time constants, etc. Thus when the code is Julia all the way down, it can and will compile everything together to optimize it a lot more than functions compiled separately, giving a lot more performance benefits. When you add in the macros to turn off things like bounds checks and adding in explicit SIMD, you truly get to C-level of performance and many times beyond because your code is so architecturally and vertically optimized (it's like you put on the flags to say "compile code that only works for this current machine with this current codebase", and it can safely make this assumption because it's JITing).

Because of this, it goes much further than Cython, and this also makes the type system and multiple dispatch central to the language. So I would say at a surface level it's "fast Python" (or "more productive C", that's how I usually think of it). However, at a deeper level the type system is so central that larger software architectures will be different to accommodate this multiple dispatch style as opposed to OOP.

I don't any purpose for Julia, beyond being a language geek, but it surely looks like a kind a of Dylan's 2nd coming, regarding how everything is put together.

Loooking forward how it evolves.

Thank you! :-D Such a great answer, wish I could +100 that.
I have been interested to see how Apache Arrow has targeted SIMD enablement with data layout. Is there a sweet spot using Julia with Arrow?
This should be in julialang.org frontpage.
Superficially you could say that, but the design of Julia is such that it is much easier for you to grasp what is going on at the lower levels and the performance implications of the code you write.

Python is much more of a black box, performance wise.

Also something like Cython is radically different from Julia. Cython is compiled to C code. Julia is based on JIT so while you have an interactive session in the REPL, functions are continously compiled as you make them.

I've gotten Cython-like speeds for non-parallelized code in a gradient descent task. I haven't been able to achieve the same speed as parallelized Cython code in Julia using either pmap or @parallel though.

I wouldn't say it's a fast Python.

Edit: Grammar

You weren't doing the same thing. Julia's `pmap` and `@parallel` are multiprocessing. These will parallelize across multiple computers, like multiple nodes of a cluster. It has much larger scaling potential (it's more like MPI) but at the cost of a larger overhead (like MPI). It for example was used to achieve >1 petaflops in the Celeste.jl application on the Cori supercomputer.

Cython's parallelism is via OpenMP which is shared memory multithreading. Of course multithreading is faster, but it's restricted to a single computer. Julia does have multithreading as well via `Threads.@threads`. This is shared memory and restricted to a single computer just like Cython, and will have a lot lower overhead than `pmap` and `@parallel`. If you want to directly compare something to Cython's parallelism, this is what you should be looking at.

On a side note, it looks like Cython doesn't have any native multiprocessing or multinode parallelism that would be the direct comparison to `pmap` or `@parallel`.

Thanks! From the documentation and all the threads I searched in discourse, it was never clear to me that @parallel and pmap where aiming towards the direction you just described.

I did try Threads.@threads, but the overhead was way too high. I might look into it again soon.

Threads.@threads is still consider a bit experimental and there are a few performance pitfalls that one can stumble into. I talked about this for a bit in http://slides.com/valentinchuravy/julia-parallelism, but if you still have issues after reading that feel free to reach out on https://discourse.julialang.org and we will figure out what is going on.
Very technical presentation, but it contains nuggets of that I can't wait to try once I get home! Do you have a blog that lays this out in a way that is aimed at the general Julia programmer?
unfortunately slides.com is blocked at work....
The problem isn't overhead but that there's a performance bug that one can easily hit with multithreading right now, which is why it's labelled experimental. When that's fixed hopefully you'll be happy :). A function barrier fixes it, but it's a little nasty. This is probably the bug I want fixed most, but since it's not syntax breaking it's a slated for v1.x and not v1.0.