| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by fermienrico 1911 days ago

Are the performance claims of Julia greatly exaggerated?

Julia loses almost consistently to Go, Crystal, Nim, Rust, Kotlin, Python (PyPy, Numpy): https://github.com/kostya/benchmarks

Is this because of bad typing or they didn't use Julia properly in idiomatic manner?

8 comments

stabbles 1911 days ago

I think it's more interesting to see what people do with the language instead of focusing on microbenchmarks. There's for instance this great package https://github.com/JuliaSIMD/LoopVectorization.jl which exports a simple macro `@avx` which you can stick to loops to vectorize them in ways better than the compiler (=LLVM). It's quite remarkable you can implement this in the language as a package as opposed to having LLVM improve or the julia compiler team figure this out.

See the docs which kinda read like blog posts: https://juliasimd.github.io/LoopVectorization.jl/stable/

And then replacing the matmul.jl with the following:

    @avx for i = 1:m, j = 1:p
        z = 0.0
        for k = 1:n
            z += a[i, k] * b[k, j]
        end
        out[i, j] = z
    end

I get a 4x speedup from 2.72s to 0.63s. And with @avxt (threaded) using 8 threads it goes town to 0.082s on my amd ryzen cpu. (So this is not dispatching to MKL/OpenBLAS/etc). Doing the same in native Python takes 403.781s on this system -- haven't tried the others.

SatvikBeri 1911 days ago

I've rewritten two major pipelines from numpy-heavy, fairly optimized Python to Julia and gotten a 30x performance improvement in one, and 10x in the other. It's pretty fast!

paul_milovanov 1911 days ago

looks like they're just multiplying two 100x100 matrices, once? (maybe I'm reading it wrong?) in Julia, runtime would be dominated by compilation + startup time.

A fair comparison with C++ would be to at least include the compilation/linking time into the time reported.

Ditto for Java or any JVM language (you'd have JVM startup cost but that doesn't count the compilation time for bytecode).

Generally, for stuff (scientific computing benchmarks) like this you want to run a lot of computation precisely to avoid stuff like this (i.e you want to fairly allow the cost of compilation & startup amortize)

StefanKarpinski 1911 days ago

This appears to be a set of benchmarks of how fast a brainfuck interpreter implemented in different programming languages is on a small set of brainfuck programs? What a bizarre thing to care about benchmarks for. Are you planning on using Julia by writing brainfuck code and then running it through an interpreter written in Julia?

fermienrico 1911 days ago

Seems like you're the founder of Julia. Why such a knee jerk reaction? Did you read the benchmark page? The table of content is right at the top.

Optics of this type of reaction is seen everywhere in the Julia community. My advice is to embrace negativity around the language, try to understand if it is fabrication or legitimate, and address the shortcomings.

Julia is a beautiful language and hope some of the warts of the language gets fixed.

StefanKarpinski 1911 days ago

When I wrote that I was under the impression that the brainfuck interpreter implementations were the only benchmarks in the repo. There are, however (I now realize), also benchmarks for base64 decoding, JSON parsing, and writing your own matmul (rather than calling a BLAS matmul, which is not generally recommended), so this is more reasonable than I thought but still a somewhat odd collection of tasks to benchmark. Of course, microbenchmarks are hard — they are all fairly arbitrary and random.

In a delightful twist, it seems that there is a Julia implementation of a Brainfuck JIT that is much faster than the fastest interpreter that is benchmarked here, so even by this somewhat esoteric benchmark, Julia ends up being absurdly fast.

https://news.ycombinator.com/item?id=26585042

neolog 1911 days ago

I'm a daily Julia user but tbh I've gotta agree with parent commenter. I think Jeff's attitude in the "What's bad about Julia" talk is the right way to handle criticism: listen to the person, ask about their use cases, understand how Julia could be improved for that user. Accepting criticism makes a good product, and seeing project leaders do it makes a good impression.

yellowcake0 1910 days ago

Oh come now, are we really so delicate that one brusque comment gets our back up?

Does the man have to be obsequious everytime he discusses his language in an informal setting?

neolog 1910 days ago

It's not that we're delicate, it's that poor communication between users and maintainers leads causes problems. As for "one comment", OP already mentioned that defensiveness is becoming an issue in the community.

StefanKarpinski 1911 days ago

Not my best but as I said, I was genuinely confused about the benchmarks.

CyberDildonics 1911 days ago

Don't be ridiculous. If someone puts emphasis on nonsense, dismissing it is reasonable.

tgv 1911 days ago

Idk, but just a few weeks ago I started looking at Julia, partly because of the performance claims. I wanted to write a program a bit heavier than your average starter program, so I wrote a back-tracker (automatic layout for stripboards, to be precise). It was

* interesting (not fun) to find out how Julia works

* annoying AF to discover that much of the teaching material was hidden behind some 3rd party website, presumably in videos (I didn't bother to register, but started browsing the manual instead). What's wrong with text?

* unnecessarily complex because the documentation for the basic functions is nearly inaccessible to beginners.

But, I managed to get a simple layout system up and running, and it wasn't fast. I rewrote it in Go (the language in which I'm currently working most), and it was literally >100x faster. And that should not be due to the startup costs, because a backtracker shouldn't have that much overhead JIT-ing.

I think I can now say that I can't see the use case for Julia. "Faster than Python" is simply not good enough, and for the rest there are no redeeming features. Perhaps the fabled partial differential equation module is worth it, but that can get ported to other languages, I guess.

oivey 1911 days ago

Your relative skill and time invested in Julia vs Go makes that a not very fair comparison, I think. A 100x difference in performance is probably a sign of something that could be fixed in your code (common one: type instability). In general, Julia is being used to implement things like competitive versions of BLAS. Your Julia code can almost certainly be made much faster.

Coming from a Python and C++ background, I found it sufficient to just read the docs and do some Advent of Code problems to get productive in Julia. What videos are you talking about? https://docs.julialang.org/en/v1/manual/performance-tips/ I found to be a pretty good document on why and when Julia can be slow.

DNF2 1911 days ago

I simply do not understand how some people are able to form so strong opinions in such a short time, and spew out disdain and negativity on the most flimsy basis. It's a matter of temperament, I guess.

Julia performance should be on par with Go, if it's slower, read the performance tips in the manual. As for teaching material on 3rd party websites, I don't know what you mean. The Julia manual is available from the julialang.org website.

As for re-writing DifferentialEquations, that is extremely strongly tied to the multiple dispatch paradigm, re-writing it would be hard. What you can get is wrappers like diffeqpy and diffeqr, which call out to Julia.

tgv 1911 days ago

You can verify that the teaching materials are not really up to scratch. Even nim and zig, which have less resources behind them, I think, do a better job there. The manual is a reference manual, and it was difficult to find all the operations on arrays. E.g., the difference between Array{Int} and Array{Int,1} is not clarified from the start.

And as I said: I wrote a straight-forward backtracker. It just recursive function calls: check a possible state for the current item, and when successful, update the overall state and move on to the next item; on return, try another state for the current item, until the search space is exhausted. There's not a lot to optimize, nor is there a lot of work for a JIT compiler.

> on the most flimsy basis

I've got more gripes. Forward type declaration to name one. But I'm not spewing disdain: I just don't see Julia take a larger role in general software development.

DNF2 1911 days ago

I have no particular opinion on the teaching materials, I just use the manual and the discussion fora, so I don't know. But if a third party offers teaching materials, it's not so strange if it resides on their third party website.

As for performance, I'm not really talking about 'optimization'. Your implementation may simply have used some pattern that should be avoided, such as global variables, type instabilities, abstract types in structs, or some inappropriate data structures. If it's a microbencmhark, then there are some things to keep in mind.

These are not really optimizations, but basic performance principles. I cannot know that you are unaware of them, but your statement that 'there's not a lot to optimize' make me suspect that this could be the case. The unusual thing about Julia is that it's both dynamic and compiled, so that code that would simply not compile in static languages instead ends up slow.

oivey 1910 days ago

If I had to guess, your problem is type stability. Are you using NamedTuples to store your state and items you’re iterating over? If the keys and are not all the same and value types don’t stay the same (e.g. something initialized as zero(Int) and then accumulated into with Float64s) then performance will suffer. Another possibility is that you have a data type not is not concrete in an inner loop. For example, Array{Real} will be slower than Array{Float64} because an array of Reals has to support arrays mixing Float32 and Float64. If you had this in a function definition the likely correct thing to do is Array{<:Real}, which means the element type of the array must be a subtype of Real. Maybe even better, just drop the type annotations. They very, very rarely improve performance because Julia relies on compile time type inference.

Failed or bad type inference is almost always the cause of performance issues in Julia. Getting a feel for when the compiler can infer things or not takes practice, but it’s a lot easier than the semantics of generic programming systems IMO.

The REPL is really great for learning. If you type “Array{Int} == Array{Int, 1}” the result is false. If you type “?Array” it prints the docstring which gives some guidance on how to use one versus the other.

otde 1911 days ago

I think this particular Julia code is pretty misleading, and I'm (probably) one of the most qualified people in this particular neck of the woods. I wrote a transpiler for Julia that converts a Brainfuck program to a native Julia function at parse time, which you can then call like you would any other julia function.

Here's code I ran, with results:

  julia> using GalaxyBrain, BenchmarkTools

  julia> bench = bf"""
      >++[<+++++++++++++>-]<[[>+>+<<-]>[<+>-]++++++++       
      [>++++++++<-]>.[-]<<>++++++++++[>++++++++++[>++
      ++++++++[>++++++++++[>++++++++++[>++++++++++[>+       
      +++++++++[-]<-]<-]<-]<-]<-]<-]<-]++++++++++."""

  julia> @benchmark $(bench)(; output=devnull, memory_size=100)
  BenchmarkTools.Trial: 
    memory estimate:  352 bytes
    allocs estimate:  3
    --------------
    minimum time:     96.706 ms (0.00% GC)
    median time:      97.633 ms (0.00% GC)
    mean time:        98.347 ms (0.00% GC)
    maximum time:     102.814 ms (0.00% GC)
    --------------
    samples:          51
    evals/sample:     1

  julia> mandel = bf"(not printing for brevity's sake)"

  julia> @benchmark $(mandel)(; output=devnull, memory_size=500)
  BenchmarkTools.Trial: 
    memory estimate:  784 bytes
    allocs estimate:  3
    --------------  
    minimum time:     1.006 s (0.00% GC)
    median time:      1.009 s (0.00% GC)
    mean time:        1.011 s (0.00% GC)
    maximum time:     1.022 s (0.00% GC)  
    --------------
    samples:          5  evals/sample:     1

Note that, conservatively, GalaxyBrain is about 8 times faster than C++ on "bench.b" and 13 times faster than C on "mandel.b," with each being the fastest language for the respective benchmarks. In addition, it allocates almost no memory relative to the other programs, which measure memory usage in MiB.

You could argue that I might see similar speedup for other languages on my machine, assuming I have a spectacularly fast setup, but this person ran their benchmarks on a tenth generation Intel CPU, whereas mine's an eighth generation Intel CPU:

  julia> versioninfo()
    Julia Version 1.5.1
    Commit 697e782ab8 (2020-08-25 20:08 UTC)
    Platform Info:  OS: Linux (x86_64-pc-linux-gnu)
    CPU: Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz  
    WORD_SIZE: 64
    LIBM: libopenlibm  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)

This package is 70 lines of Julia code. You can check it out for yourself here: https://github.com/OTDE/GalaxyBrain.jl

I talk about this package in-depth here: https://medium.com/@otde/six-months-with-julia-parse-time-tr...

StefanKarpinski 1911 days ago

I love the Julia community

Sukera 1911 days ago

@btime reports the minimum execution time, since all increases are attributable to noise. Use @benchmark to get mean, median and maximum instead.

otde 1911 days ago

Thank you! Edited to fix.

eigenspace 1911 days ago

Beautiful

BlackFingolfin 1911 days ago

This is really cool!

But note that OP uses larger cells (`int` = 32 bit in the C version, `Int` = 64 bit in the Julia version) while GalaxyBrain seems to use 8 bit cells. Not that I expect this to make a major difference (but perhaps a minor one?)

otde 1911 days ago

The real issue is that the original brainfuck spec (as given by the Wikipedia entry) explicitly sets the sizes of each cell to a single byte —- which means many of the interpreters used for this benchmark are using incorrect cell sizes!

fishmaster 1911 days ago

That is very cool.

adgjlsfhk1 1911 days ago

They are measuring compile time, not runtime speed.

stabbles 1911 days ago

They are measuring compile time and runtime speed, not just runtime speed like for statically compiled langauges

stjohnswarts 1911 days ago

Is that truly accurate though ? I could see them comparing say load time of data files plus execution time but combining compile times in there doesn't make much sense. You always have to pay for it in julia but not with a statically compiled file.

dklend122 1911 days ago

You only pay for it on the first run.

dunefox 1911 days ago

Where does it say that?

ZeroCool2u 1911 days ago

I'm not a huge Julia user, but typically if they don't specifically mention they're segmenting runtime from compilation time with Julia, that's a bit of a red flag, because unlike Rust, Go, or C++ the compilation step isn't separate in Julia. To the user it just looks like it's running, when in reality it's compiling, then running, without really letting you know in between.

cygx 1911 days ago

In the matrix multiplication example, the measurement is done via a simple

    t = time()
    results = calc(n)
    elapsed = time() - t

So startup time at least isn't included.

One might argue that this is still biased against Julia due to its compilation strategy, but fixing that would mean you'd have to figure out what the appropriate way to get 'equivalent' timings for any of the other languages would be as well - something far more involved than just slapping a timer around a block of code in all cases...

edit: As pointed out below, the Julia code should indeed already have been 'warmed up' due to a preceding sanity check. My apologies for 'lying'...

3JPLW 1911 days ago

The problem is a minor placement issue for the `@simd` macro: https://github.com/kostya/benchmarks/pull/317

komuher 1911 days ago

If u cant even read code dont lie xD

    n = length(ARGS) > 0 ? parse(Int, ARGS[1]) : 100
    left = calc(101)  # <------- THIS IS COLD START JITTING CALL
    right = -18.67
    if abs(left - right) > 0.1
        println(stderr, "$(left) != $(right)")
        exit(1)
    end

    notify("Julia (no BLAS)\t$(getpid())")
    t = time()
    results = calc(n)
    elapsed = time() - t
    notify("stop")

stabbles 1911 days ago

Ah, I have to take that back, since benchmarks run in the order of seconds and they use sockets to start and stop the timer, which likely means compilation time is not included.

machineko 1911 days ago

I think i can answer that, first of all Julia isnt as fast as C/C++/Nim etc. in most cases Julia is just fast in scientific computing that's all. (there is only one "scientific" benchmark on kostya benchmarks)

Second to write very fast julia u need to knew a lot of "tricks" and in most cases u won't be doing it as easy as writing normal code.

And all people writing this benchmark is measuring compilation time (XD?) or not including jitting time they could just look at code/readme for 5s before commenting.

Julia is fast and can be as fast as C but not in all cases and not as easy at it seems.

snicker7 1911 days ago

> Second to write very fast julia u need to knew a lot of "tricks" and in most cases u won't be doing it as easy as writing normal code.

That's true in literally any language. Some languages require inlined assembly. Others require preprocessor directives. In almost all languages, you need to understand the difference between stack and heap, know how to minimize allocations, know how to minimize dynamic dispatch, know how to efficiently structure cache-friendly memory layouts. And of course, data structures & algorithms 101.

In terms of performance, Julia provides the following:

1. Zero-cost abstractions. And since it has homoiconic macros, users can create their own zero-cost abstractions, e.g. AoS to SoA conversions, auto-vectorization. Managing the complexity-performance trade-off is critical. But you don't see that in micro-benchmarks.

2. Fast iteration speed. Julia is optimized for interactive computing. I can compile any function into its SSA form, LLVM bytecode, or native assembler. And I can inspect this in a Pluto notebook. Optimizing Julia is fun, which is less true in other languages.

RyEgswuCsn 1911 days ago

> That's true in literally any language. Some languages require inlined assembly. Others require preprocessor directives. In almost all languages, you need to understand the difference between stack and heap, know how to minimize allocations, know how to minimize dynamic dispatch, know how to efficiently structure cache-friendly memory layouts. And of course, data structures & algorithms 101.

I think what s/he meant to say is that Julia is not "magically" faster than other languages. The real questions are:

1. Can unoptimised Julia code run as fast as unoptimised c/c++ code? I think the linked benchmark suggests this is not really the case.

2. Can optimised Julia code run faster than comparably (i.e. requiring similar amount of effort and expertise) optimised c/c++ code? If not, then why use Julia?

mbauman 1910 days ago

> Julia is not "magically" faster than other languages

That's somewhat true, and is at the end-point of some mismatched expectations when folks come to Julia. Julia is a high-level dynamic language whose semantics are conducive to creating the ~same performance as static languages.

So if your unoptimized Julia program relies upon traditional "dynamic" features like `Any[]` arrays, then you should expect to see dynamic- (read: python-) like performance out of Julia. Julia should match performance of other dynamic languages here, but the complier doesn't have all the typical dynamic optimizations because, well, it's often easy to write your code in a manner that ends up hitting the happy path that gets the static-like performance.

Conversely, if your dynamic language baseline is just glue to an optimized static library, then you should expect to see static-like (read: C/C++-like) performance out of your dynamic language. Julia really should match performance here, and if it doesn't, open an issue: it's a bug.

Where Julia truly excels are the cases where you don't have a library implementation (like numpy) to lean on and find yourself writing a hot `for` loop in a dynamic language. Further, it excels at facilitating library creation, leading to more and more first-class ecosystems that are best-in-class like DiffEq.

snicker7 1910 days ago

> So if your unoptimized Julia program relies upon traditional "dynamic" features like

Dynamic dispatch is slow in any language, including C/C++ (provided that the compiler can't devirtualize the method). This is why such things are never done in an inner loop.

In C++, its harder to "accidentally" use dynamic dispatch because you have to explicitly annotate a function as being virtual. In Julia, which is much more concise, type stability or instability is implicit. But it can be inspected statically via @code_warntype. Good IDE plug-ins can make it easier.

bananaquant 1909 days ago

Julia optimizes for a different thing. You can get your result, as in the actual useful thing that the code does/produces, much faster than with C/C++. You can skip type annotations, not worry about the memory usage, and write your code interactively using REPL or the excellent Revise.jl package.

If you have saved a couple of minutes or hours of coding and are only going to run that code a handful of times, it should not matter if it runs a second or two slower than C/C++. This is the same rationale that Python and other scripting languages have. But unlike Python, you should be able to match the speed of C/C++ or get pretty close by optimizing your code.

RyEgswuCsn 1909 days ago

Yes I get your point. I guess I should have phrased my first question like the following

1. Can unoptimised Julia code run faster than unoptimised Python code (with numpy being used to do the heavy lifting)?

Let's say one is prototyping some algorithm so iteration speed is more relevant than running speed. Then one can choose either Julia or Python (with the help of numpy perhaps) and get an implementation in similar timeframes. So Julia won't necessarily be more attractive here.

Now if the prototype proved that running speed is very critical to the successful application of the algorithm, then it would mean the developer now has to optimise the hell out of it. One can either:

1. Optimise the Julia codebase, if Julia was used to prototype, following the many tips and tricks available (e.g. type stability, various macros, etc.).

2. Port the algorithm to C/C++, applying the many performance best practices that people have accumulated over the years.

So if the optimised C/C++ port is capable of being any faster than the optimised Julia code, then the rational choice would be to port the implementation using C/C++; it would also mean Python would have some advantage over Julia in the prototyping phase too due to its popularity. Otherwise I'd agree that using a single language to both do prototyping and production is the best.

DNF2 1908 days ago

This depends on what you mean by '(un)optimized code'. Because there's a difference between unoptimized and naive code.

'Unoptimized' code should still observe most of the performance tips in the manual (such as avoiding globals and type instability), while 'naive' code frequently does not. With some experience, you never write naive code, even for quick prototypes.

In those cases, Julia should outperform other dynamic lanuages significantly, and approach static languages in most cases.

Proper optimization means going in and removing allocations, ensuring that operations vectorize (simd), tailoring data structures for performance, adding parallelism etc. In the latter case Julia should virtually _always_ match static languages closely, otherwise it merits investigation.

socialdemocrat 1911 days ago

To be fair Julia gives you better tools to analyze your code and figure out how to write more efficient. Being able to look at all the steps a JIT compiler will perform on an individual function helps a lot in building an intuition about what you should and should not do while writing high performance Julia code