Hacker News new | ask | show | jobs
by jklontz 3954 days ago
This is _not_ a post about auto-vectorization in Rust (which would have been a lot more interesting!). The provided Mandelbrot Set example is algorithmically similar to loop unrolling with a 4x unroll factor. The strategy works well in this case because neighboring locations in the Mandelbrot Set tend to require similar numbers of iterations to compute.
3 comments

I'm sorry you didn't find the blog post as interesting as you hoped. :) As others have pointed out, I talk a bit about how rustc gets autovectorisation by leaning on LLVM in the "Explicit SIMD in the Compiler" section.

In any case, I agree the Mandelbrot example isn't so interesting: I included it because it is relatively simple, well-known and gives a pretty picture (i.e. good for a blog post where a single example isn't mean to be the focus). In fact, manual unrolling catering to autovectorisation is how Rust is currently top of the mandelbrot benchmark game[1], and explains the equal performance of the explicit-SIMD and scalar versions of spectral-norm on AArch64 (although the fact they aren't equal on x86 hints at the lack of guarantees around autovectorisation).

I find the examples like matrix inversion, nbody and fannkuch-redux are more compelling because the vectorised version is far less similar to the scalar one ("strange" shuffles, approximation of floating point ops and dynamic byte shuffles with precomputed values, respectively).

[1]: http://benchmarksgame.alioth.debian.org/u64/performance.php?...

This article could use some disassembly (and LLVM IR) from compiled code to see what a piece of Rust SIMD code looks like when compiled for different architectures. No doubt you've done this when debugging, but it would also be useful for the rest of us.

How well does it work in general? When you write SIMD code, can the compiler keep the values in vector registers or is there spilling going on?

As you can see from the benchmarks, it works basically as well as industrial C/C++ compilers like Clang (if not slightly better) and GCC (although GCC's older and more optimised backend leaps ahead of the LLVM-based compilers in some cases).

I'm planning follow up posts which may involve more assembly/IR, but this is designed to be an introduction/high-level post, and the graphs are meant to serve as a summary/replacement for digging through reems of assembly.

Ehh, auto-vectorization is a much less interesting problem than doing fun things manually.
Does Rust not already get LLVM's current auto-vectorization for free?
From the article:

  > Speaking of the optimiser, rustc uses LLVM, which is industrial
  > strength, and supports a lot of autovectorisation