Hacker News new | ask | show | jobs
by slavik81 2233 days ago
You could also add -ffast-math, which loosens the rules for floating point optimizations. For example, it would allow the compiler to turn floating point divisions into multiplications by an inverse, and to group operations more efficiently even if doing so would slightly affect rounding. It also rounds denormal numbers down to zero, which can greatly improve performance on a lot of hardware.

-march=native may also be useful, as it would allow the compiler to use newer CPU instructions, and tune the generated code to your hardware. That would make the program less portable, but it's not like CUDA is portable either.

My machine matches those numbers surprisingly closely. With -O0 it took 89.6s. With -O3, it took 11.7s. With -Ofast (which combines -O3 and -ffast-math), it took 10.6s. With -Ofast -march=native, it took 8.9s. I would expect those gains to extrapolate to the multi-threaded version, maybe pushing it down to 1 second without any further work. (Note: I'm using GCC on Ubuntu 18.04 with a Haswell i7. Your mileage may vary.)

1 comments

Offtopic, but I think that languages should have special float types that trigger the use of fast math. That way, a programmer can better control which parts of a program are done with approximate floating point operations.
One of the reasons why I think Zig looks appealing is that you can set the policy for these sorts of things on a per-block basis: https://ziglang.org/documentation/master/#setFloatMode
Imho that's too fine-grained. Usually you determine which variables don't need strict accuracy rather than which code, so that's better controlled through types. Also, it's easy to limit approximations to code blocks by casting to/from approximate types around a code block so you can still have fine grained control.