Hacker News new | ask | show | jobs
by gavinray 1735 days ago
Is this using JVM/JIT or using Graal to make a native-image binary and calling that? You'd get better results due to shaving off startup time with that if it does include it.

I think .NET 6 latest preview would slightly beat out Java here as well.

This isn't exactly apples-to-apples since Zig/Rust/Go are considered "systems languages", but the JVM has GraalVM for compiling to native binaries/libraries and even importing/exporting C functions and structs. And .NET of course has .NET Native + "Dotnet Native Exports", including the LLVM experiment where it uses LLVM bitcode instead of Ryu.

So you can make the argument that writing a native binary or a library which exported this benchmark function as a C-callable method with a C header in each language would technically equivalent.

The JVM and .NET ones would have a large size (several MB each) but would otherwise still fill this requirement.

2 comments

It's plain Java (i.e. JVM/JIT), ForkJoinTask based implementation. As in the original implementation, measurement is done around the quickSort() call.

One point is actually that the parallel quick sort algorithm is a bad benchmark for task schedulers (it doesn't scale well for one thing). Another point is, well, that one can spend two years of deep technical work and then be easily beaten by some "legacy" tech in the course of a morning exercise. Maybe those bearded guys were good for something after all :)

First of all you, along with a few others, misunderstood the goal of the scheduler. I note in the post that it's primarily for async execution. See a previous comment of mine on how a fork-join optimized thread pool which hooks into the scheduler to wait for the other-forked side and run the poll-loop inline is ideal for a fork-join case, and why I'm intentionally not benchmarking that. Given rayon::join actually does this, i'd be curious to see the results of that vs Java's ForkJoinPool on your machine to see if the optimizations match up.

Second, parallel quicksort isn't a bad benchmark, and it does scale enough to stress-test the spawning and joining aspects of the scheduler. Keep in mind, the best thing to scale AFAIK is one that is embarrassingly parallel and takes enough time to offset the cost of any scheduler overhead and contention. Again, this thread pool is optimized to execute small tasks. To that, there are indeed better benchmarks but quicksort with small-size optimization is one that is most widely understood.

Finally, you're in the game of trying to invalidate others work due to novelty, lack of understanding on your part, and generalizations on culture. I'm here to learn about cool scheduler designs. Would appreciate if you would contribute in that aspect instead of the former.

This is most likely using HotSpot as I don’t believe Graal has released anything past Java 11.

I don’t know if native-image would perform better. I’ve mostly found that it performs worse than HotSpot overall, especially once you start generating garbage and the heap gets larger the Serial GC won’t keep up with G1.

Yeah, I think people seriously underestimate the abilities of the JVM