|
|
|
|
|
by Const-me
1073 days ago
|
|
I’m probably an optimization expert, and I would solve that problem completely differently. On my computer, the initial C version runs at 389 MB / second. I haven’t tested the assembly versions, but if they deliver the same 6.2x speedup, would result in 2.4 GB/second here. Here’s C++ version which for long buffers exceeds 24 GB/second on my computer: https://gist.github.com/Const-me/3ade77faad47f0fbb0538965ae7...
That’s 61x speedup compared to the original version, without any assembly, based on AVX2 intrinsics. |
|