|
|
|
|
|
by saynsedit
3505 days ago
|
|
Nah, correct solution is simply to use memcpy(), works on all compilers, all platforms, all versions, with SSE and with any flags specified: #include <stdlib.h>
#include <stdint.h>
uint64_t sum (char *p, size_t nwords)
{
uint64_t res = 0;
size_t i;
for (i = 0; i < nwords; i += 8) {
uint64_t tmp;
memcpy(&tmp, &p[i], sizeof(tmp));
res += tmp;
}
return res;
}
|
|
Deal breaker: your memcpy invocation requires a sufficiently smart compiler to convert into normal unaligned load on x86 and seems to prevent GCC autovectorization. In this case OP actually didn't want vectorization, but in general it happens that such workarounds confuse compilers and produce worse code.