Golang folks, after looking at the code, I am surprised and don't understand why Go version is slow compared to C/C++/Rust. Can anyone please explain? Thank you.
Go's int defaults to 64 bits on a 64 bit machine while C/Rust/Java are all defaulting to 32 bit on the author's machine.
Editing the code and running local to get a difference factor, it looks like the same Go code with int takes 1.5x the amount of time as the int32 code on my machine. That puts it ~right next to Java (assuming the perf change maps 1:1). Remove the GC and it comes in ~right next to C++.
Looking at the compiler assembly output on Godbolt it's fun to see all the extra panic info and checks Go adds before going to the loop whereas the C just segfaults from blasting forward anyways, sometimes with no detail.
Anyways, that's why I don't like these types of "benchmarks" (micro tests made into a media post). By the time someone actually looks into them to see what's up everyone has already seen the results and moved on.
Usually less on computation performance (often identical or very close to in raw cycle count for the operation) but more on cache/memory performance. The way that maps is platform specific of course.
64 bit mod is much slower than 32 bit mod. Your C program uses int32_t everywhere, while your Go program uses int (which is probably 64 bit on you machine); so this is not a fair comparison. Changing the Go program to use int32 everywhere makes it 35% faster on my machine.
Author here: Yeah, I was surprised that there doesn't seem to be many options for extra optimizations in the Go compiler. Would be curious if Go experts have more insight or other compiler recommendations.
Editing the code and running local to get a difference factor, it looks like the same Go code with int takes 1.5x the amount of time as the int32 code on my machine. That puts it ~right next to Java (assuming the perf change maps 1:1). Remove the GC and it comes in ~right next to C++.
Looking at the compiler assembly output on Godbolt it's fun to see all the extra panic info and checks Go adds before going to the loop whereas the C just segfaults from blasting forward anyways, sometimes with no detail.
Anyways, that's why I don't like these types of "benchmarks" (micro tests made into a media post). By the time someone actually looks into them to see what's up everyone has already seen the results and moved on.