|
|
|
|
|
by nwallin
1073 days ago
|
|
Replying to my own post: The off by 1 error was incorrect. It's because I was calling the function wrong. I had been giving it the size of the buffer, not the size of the string. Also, someone else figured out that we can just use an and instruction instead of cmp. That gives us this version: #include <stddef.h>
#include <stdint.h>
int run_switches(const char *s, const size_t n) {
int res = 0;
uint8_t tmp = 0;
for (int i = n & 127; i--; ++s)
tmp += 1 & *s;
res += tmp;
for (int i = n >> 7; i--;) {
tmp = 0;
for (int j = 128; j--; ++s)
tmp += 1 & *s;
res += tmp;
}
return 2 * res - n;
}
This is 111GB/s, up from 4.5GB/s in the blog. I'm going to try really hard to put this problem down now and work on something more productive. |
|