| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tomsmeding 2231 days ago
	Nice work on the GPU programming, and the multicore before that, but I'm mystified why going from -O0 to -O3 is named an "optimisation". All respect for Fabien, but running code that's supposed to run faster than a snail (and if you're not debugging and require -O0 for reasonable output) implies -O2 or -O3. (In practice, -O3 often doesn't give much performance over -O2, despite increasing compile times.) The initial time is not 101.8 seconds, it's 11.6 seconds.

2 comments

slavik81 2231 days ago

You could also add -ffast-math, which loosens the rules for floating point optimizations. For example, it would allow the compiler to turn floating point divisions into multiplications by an inverse, and to group operations more efficiently even if doing so would slightly affect rounding. It also rounds denormal numbers down to zero, which can greatly improve performance on a lot of hardware.

-march=native may also be useful, as it would allow the compiler to use newer CPU instructions, and tune the generated code to your hardware. That would make the program less portable, but it's not like CUDA is portable either.

My machine matches those numbers surprisingly closely. With -O0 it took 89.6s. With -O3, it took 11.7s. With -Ofast (which combines -O3 and -ffast-math), it took 10.6s. With -Ofast -march=native, it took 8.9s. I would expect those gains to extrapolate to the multi-threaded version, maybe pushing it down to 1 second without any further work. (Note: I'm using GCC on Ubuntu 18.04 with a Haswell i7. Your mileage may vary.)

link

amelius 2231 days ago

Offtopic, but I think that languages should have special float types that trigger the use of fast math. That way, a programmer can better control which parts of a program are done with approximate floating point operations.

link

slavik81 2231 days ago

One of the reasons why I think Zig looks appealing is that you can set the policy for these sorts of things on a per-block basis: https://ziglang.org/documentation/master/#setFloatMode

link

amelius 2231 days ago

Imho that's too fine-grained. Usually you determine which variables don't need strict accuracy rather than which code, so that's better controlled through types. Also, it's easy to limit approximations to code blocks by casting to/from approximate types around a code block so you can still have fine grained control.

link

Natasha35Khan 2231 days ago

Agree with you

link

Aleeshakhan786 2231 days ago

Me too

link