| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by fuber2018 1072 days ago
	If I unroll the main while loop to handle 4x as much each time through the loop in the SWAR-version, the runtime drops to 0.0562s (average 10 runs). That's an overall 57.5x speedup.

1 comments

fuber2018 1072 days ago

If I convert the unrolled-64-bit SWAR function to use 32-bit chunks instead, average runtime almost doubles, approx. 0.1s now.

Need sleep now.

link

fuber2018 1072 days ago

If I unroll the 64-bit SWAR version by 8x instead of 4x, the runtime is reduced by another 10% over the 4x-unrolled SWAR version. Diminishing returns...

link