| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by red2awn 1066 days ago
	I experimented with different optimizations and ended with 128x speedup. The improvement mainly comes from manual SIMD intrinsics, but you can go a long way just by making the code more auto-vectorization friendly as some other comments have mentioned. See: https://ipthomas.com/blog/2023/07/n-times-faster-than-c-wher...