Hacker News new | ask | show | jobs
by z92 2618 days ago
Julia was astonishing. It's a high level language that's performing almost like C.

Last time I checked, many years back, the spec was changing and the run time did crash. Guess it has gone a long way since.

The other one is Lua. My assumption was that it's one of the lightest and fastest language around. Looks like "fastest" isn't true in some cases.

4 comments

Different languages' benchmarks might not be equally well-written / optimized. In particular, I'd expect C and Rust to be very close to each other, and a 20% gap between them is a red flag.

Rules like "code should be simple, as in, easy to read and understand" are also hard to judge, especially near the top of the list where there's a lot of pressure to optimize. Is SIMD easy to understand? What if it's in a library? What if the library was written specifically for this benchmark? Etc. I think https://benchmarksgame-team.pages.debian.net/benchmarksgame/ has to deal with every possible permutation of this debate.

Not necessarily. Because C allows the pointer manipulation, the compiler can in general not make assumptions about pointer aliasing. This prevents some optimizations.

In Rust, the compiler has more information/control over memory layout/lifetime and can therefore make stronger optimizations.

Automatic vectorization is an area where this helps a lot, and raytracing can benefit a lot here. 20% sounds reasonable to me.

Well, generally, perhaps. But any performance oriented C programmer worth his or her salt would be aware of aliasing issues and write code in such a way that it doesn't cause problems for the compiler. Plus, this is a toy benchmark of a few hundred lines so the compiler can do full-program analysis. So the 20% difference is indeed a smell.

Looking at the crb*.c files, structs are passed as pointers and not by value. This makes it harder for the compiler to analyze the data flow which I would bet is part of the reason Rust is faster here.

> pointer aliasing

Unfortunately, due to LLVM bugs, the Rust developers had to disable that optimization, more than once. I don't know whether the "1.13.0-nightly" he used has that optimization enabled or disabled. (See https://github.com/rust-lang/rust/issues/31681 and https://github.com/rust-lang/rust/issues/54878 for the relevant Rust issues.)

That's a good point, I didn't think about the effect on autovectorization. Do you think that's what's happening here? My impression was that getting good vector code out of the compiler usually requires manually tuning things.
shouldn't rust being faster than C be something of a red flag that they aren't quite the same algorithm? Or that the algorithm is sub-optimal?
C and Rust have been trading blows on the language benchmark games for a while now which dictates the algorithm used. From my experience, it's relatively easy to accidentally write fast Rust, but incredibly hard to write fast C.

https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

It got me thinking as well, so I ventured and did some experiments on this, and found that the main difference is the algorithm used for the RNG; C's std lib uses a slower one (which also is thread safe, and butchered OpenMP performance). You can take a look at a more apples to apples comparison in the latest update for crb.c which uses a xor128 rng; rust is still a little faster (especially when going multithreaded), but not quite the difference in the README file, still need to get some time to update it.
fwiw, I looked at some of the quicker c/rust examples, without too much other analysis

  crb-vec-omp //I added some #pragma omp to crb-vec
  executable size:
    18k
  time:
    real 0m3.630s
  valgrind: 
    ==17703== HEAP SUMMARY:
    ==17703==     in use at exit: 7,408 bytes in 15 blocks
    ==17703==   total heap usage: 20 allocs, 5 frees, 
  14,790,856 bytes allocated

  rsrb_alt_mt.rs
  executable size:
      426k
  time: 
    real 0m1.630s
  valgrind: 
    ==7221== HEAP SUMMARY:
    ==7221==     in use at exit: 43,120 bytes in 216 blocks
    ==7221==   total heap usage: 256 allocs, 40 frees, 
  11,113,784 bytes allocated
and because we have a number of tiny single cpu vm's out there (which would also benefit from a performant language) I gave it a shot there:

  :~# time ./rsrb_alt_mt
  ./rsrb_alt_mt: /lib64/libc.so.6: version `GLIBC_2.18' not 
  found (required by ./rsrb_alt_mt)

  real 0m0.002s
  user 0m0.002s
  sys 0m0.000s

  :~# time ./crb-vec-omp 

  real 0m24.234s
  user 0m24.160s
  sys 0m0.035s
so rust appears broken on centos 7.5 (no, I'm not going to edit the binary). But that is an insta-deal breaker for us.
Have you tried passing everything by value? That is, instead of:

    bool hit_sphere(const struct sphere* sp, const struct ray* ray, struct hit* hit)
you write:

    static bool
    hit_sphere(struct sphere sp, struct ray ray, struct hit hit)
IME, clang is insanely good at optimizing pass by value calls.
fwiw, I took a stab at replacing a bunch of => with . and ran it: time ./crb-vec-omp real 0m0.764s

which is more than twice as fast as the rust example,

but it didn't create the right output...

if you get bored would you mind taking a stab at adding parallel and modifying crb-vec.c https://github.com/niofis/raybench , I definitely think you might be on to something here.

shouldn't rust being faster than C be something of a red flag that they aren't quite the same algorithm? Or that the algorithm is sub-optimal?

The difference isn't much. And Rust is more like FORTRAN. Maybe a bit faster than C, but can't do the gymnastics with pointers that C can.

> can't do the gymnastics with pointers that C can

It can if you write the "unsafe" keyword, but there's a pretty strong community norm around not doing that sort of thing, unless you can encapsulate it inside some sort of safe API. And to be fair to C, I think C can close the gap with Rust/Fortran if you use the "restrict" keyword a lot?

With unsafe, you can do anything that C can.

Without unsafe, there’s significantly more aliasing information, which helps optimizations.

Rust is compiled by LLVM, while C compiled by GCC, which is a bit conservative. It's possible to enable same optimizations for gcc and LLVM, so their speed will match.
Wasn’t Julia specifically designed to be easy to optimise? It’s not quite like other higher level languages a they thought about performance first.
That's using version 0.4 of Julia too (current is 1.1). Current version has a a lot of improved optimization passes that would likely benefit this benchmark.