Hacker News new | ask | show | jobs
by tomsmeding 2231 days ago
Nice work on the GPU programming, and the multicore before that, but I'm mystified why going from -O0 to -O3 is named an "optimisation". All respect for Fabien, but running code that's supposed to run faster than a snail (and if you're not debugging and require -O0 for reasonable output) implies -O2 or -O3. (In practice, -O3 often doesn't give much performance over -O2, despite increasing compile times.)

The initial time is not 101.8 seconds, it's 11.6 seconds.

2 comments

You could also add -ffast-math, which loosens the rules for floating point optimizations. For example, it would allow the compiler to turn floating point divisions into multiplications by an inverse, and to group operations more efficiently even if doing so would slightly affect rounding. It also rounds denormal numbers down to zero, which can greatly improve performance on a lot of hardware.

-march=native may also be useful, as it would allow the compiler to use newer CPU instructions, and tune the generated code to your hardware. That would make the program less portable, but it's not like CUDA is portable either.

My machine matches those numbers surprisingly closely. With -O0 it took 89.6s. With -O3, it took 11.7s. With -Ofast (which combines -O3 and -ffast-math), it took 10.6s. With -Ofast -march=native, it took 8.9s. I would expect those gains to extrapolate to the multi-threaded version, maybe pushing it down to 1 second without any further work. (Note: I'm using GCC on Ubuntu 18.04 with a Haswell i7. Your mileage may vary.)

Offtopic, but I think that languages should have special float types that trigger the use of fast math. That way, a programmer can better control which parts of a program are done with approximate floating point operations.
One of the reasons why I think Zig looks appealing is that you can set the policy for these sorts of things on a per-block basis: https://ziglang.org/documentation/master/#setFloatMode
Imho that's too fine-grained. Usually you determine which variables don't need strict accuracy rather than which code, so that's better controlled through types. Also, it's easy to limit approximations to code blocks by casting to/from approximate types around a code block so you can still have fine grained control.
Agree with you
Me too