|
|
|
|
|
by 414owen
1073 days ago
|
|
The version that's friendly to the compiler is described in part two: https://owen.cafe/posts/the-same-speed-as-c/ It achieves 3.88GiB/s I intentionally didn't go down the route of vectorizing. I wanted to keep the scope of the problem small, and show off the assembly tips and tricks in the post, but maybe there's potential for a future post, where I pad the input string and vectorize the algorithm :) |
|
Don't want to pass the string length? That's fine, we can figure that out for ourselves. This code:
Is 27GB/s. With a little bit of blocking: That's ~55GB/s.Anyway, the point is, you're pretty far from the point where you ought to give up on C and dive into assembly.