Hacker News new | ask | show | jobs
by user51442 3475 days ago
A simple recursive factorial function:

https://godbolt.org/g/97dUFg

Not sure if that's smart or stupid.

5 comments

It does try to vectorize it (doing multiple math operations at once is faster).

(1 * 5 * 9 * 13 * 17 * ...) * (2 * 6 * 10 * 14 * 18 * ...) * (3 * 7 * 11 * 15 * 19 * ...) * (4 * 8 * 12 * 16 * 20 * ...)

(it's not precisely that, but close enough)

It isn't optimal however, optimal code would be pre-computed results (signed integer overflow is undefined, so n <= 12 is defined)

(if signed integer overflow would be defined to be 2's complement overflow, you can still use a table, as n > 33 gives 0)

Dude, your code got:

1. Turned from recursive form into iterative form 2. Unrolled heavily 3. Autovectorized (!)

The throughput will be substantially more than the simple version.

Well, the optimizations are pretty smart to be sure, the stupid thing is that anything over 12! overflows (as GlitchMr points out).

I'd like to know how the recursion to iteration step was done.

Presumably there is a pass to turn f(recur(x), y) into recur(f(y, acc), x) and then tail-call optimization can be applied. This works for any associative f.
I took that and check out what Clang does with 32-bit ints:

https://godbolt.org/g/ORsX5h

vs. what it does with 64-bit ints:

https://godbolt.org/g/2wBzn3

Can anybody explain the 32 bit version?

It's just vectorizing calculations so that it's faster than pure iterative calculation. 64 bit version doesn't get this probably because that optimizer isn't SSE2 aware yet (just a guess I actually don't know) and can't do SIMD arithmetic with 2 64 bit floats
Harder to vectorize 64-bit arithmetic?
was going to say this. It probably only does SSE and not SSE2. Therefore vectorization only happens for 32 bit ints.
Seems to need -mavx2 to really go to town with 64 bit: https://godbolt.org/g/6EFYeY
Thank you both, I appreciate the insight!
Interestingly, all the RISC architectures except for 64-bit ARM seem to compile down to something a lot simpler.
Use -Os to check. Compiler turned recursive function to loop.