1. Turned from recursive form into iterative form 2. Unrolled heavily 3. Autovectorized (!)
The throughput will be substantially more than the simple version.
I'd like to know how the recursion to iteration step was done.
I'd like to know how the recursion to iteration step was done.