Hacker News new | ask | show | jobs
by noelwelsh 4636 days ago
Ray-tracing, especially the simple kind in this example, is all about vector maths. CPUs are extremely good at this type of task. Finding Go performs well at this shouldn't be surprising. Any decent compiler will be able to produce good code for this task as it maps very closely to what CPUs do best, meaning you don't need much fancy analysis.

I think the majority of languages in popular use are faster than Python. I believe that Go is popular with the Python / Ruby crowd because idiomatic Go is quite close to what they do already. I.e. you don't need to learn much to shift from Python or Ruby to Go. Using a language like Scala, for instance, is a much bigger jump.

2 comments

Actually, looking deeply at Go from a performance perspective (as I have been, the last couple of days) has revealed a bunch of low hanging fruits/missed optimization opportunities in the Go compiler. That was 50 % the intent of the grand father blog post
Sure, there's plenty of work still to be done. Go 1.0 is only 18 months old, with 1.1 only 6 months old.

C++ has 30 years of history behind it, MS VC++ is 20 years old, Intel's C++ compiler is at least 10 years old, etc.

Finding Go performs well at this shouldn't be surprising.

Go doesn't do SIMD at all (see note 1). Personally I leverage Go coupled with the Intel Compiler (Go happily links with and uses very high performance C-built libraries, where I'm rocking out with SSE3 / AVX / AVX2).

To respond to something that Ptacek said above, many of us do expect Go to achieve C-level performance eventually. There is nothing stopping the Go compiler from using SIMD and automatic vectorization, it just doesn't yet. There is nothing about the language that prohibits it from a very high level of optimization, and indeed the language is generally sparse in a manner that allows for those optimizations.

*1 - For performance critical code you are supposed to use gccgo, which uses the same intermediary as the C compiler, allowing it to do all of the vectorization and the like. Unfortunately for this specific code gccgo generates terrible code, yielding a runtime that is magnitudes slower (albeit absolutely tiny). Haven't looked into why that is.

> There is nothing stopping the Go compiler from using SIMD and automatic vectorization, it just doesn't yet.

Those optimizations would almost certainly reduce the speed of the Go compiler (requiring SSA form and aliasing info).

> There is nothing about the language that prohibits it from a very high level of optimization, indeed the language is generally sparse in a manner that allows for those optimizations.

Autovectorization is very sensitive to good output from alias analysis. This is where the const and restrict keywords in C, absent in Go, are useful. I think you will at least need runtime guards in Go, whereas they are not necessary in well-written C.

My understanding is that automatic vectorization is still quite sensitive to how code is written. The compiler may fail to vectorize one implementation of an algorithm, while vectorize another, due to details in the implementation of both the code and the compiler.

My point is not about vectorization though. Code that uses mostly vectors, math, and function calls has a very direct translation to machine code. I expect all compilers to generate approximately the same machine code for this type of code, assuming vectorization doesn't come into play. So I don't expect to see large differences in performance. Of course there will be some difference, but not the order of magnitude one sees between compiled (statically or JITed) languages and interpreted languages.

> assuming vectorization doesn't come into play.

Now that 256 bit AVX registers that process 4 numbers in one go, even when one uses 64bit floats (and 8 with 32bit floats), vectorization more and more comes into play.

Using 64bit floats with 128bit SSE registers, it was kinda possible to ignore the vectorization, as it was less than 2x speedup. But no more.

"There is nothing stopping the Go compiler from using SIMD and automatic vectorization, it just doesn't yet. "

A JVM could compile byte code to SIMD instructions. Most of them don't, yet.

Auto-vectorization is impossible for a lot of real-world code because it requires changing how data is laid out in memory. Notice that the AVX version of the raytracer actually involves packing blocks of x components into a single 256-bit-wide variable. Realistically, a compiler is not going to be smart enough to figure that out.
Absolutely true, though of course you could do the same memory layout with the Go code. If we're talking about compiler comparisons, a vectorization-suitable tighter inner loop that operates on contiguous memory would be a good high performance comparison. The standard Go compiler would not vectorize it...yet...though honestly I don't know what the state of gccgo is or whether it yields an intermediary that brings the gcc vectorization into play.

And of course the reason you code for auto-vectorization is for ease of platform support. The linked AVX code will not run on the vast majority of virtual machines, or any CPU made prior to 2012. Nor will it take advantage of AVX2. I use the Intel compiler and either yield builds that I can target to specific processors or technology levels or I can add support for virtually all technologies, such that the same code will vectorize on AVX2, failing that AVX, failing that SSE3.2, failing that... you get the picture. With a suitable ARM compiler the same code would vectorize to NEON, etc.