|
|
|
|
|
by a_t48
696 days ago
|
|
Optimizing the leftovers loop to #pragma clang loop vectorize(enable)
#pragma clang loop interleave(enable)
for (; offset < length; offset += 4) {
const auto x = ((uint32_t\*)start)[offset / 4];
count += ((x & 0xFF) == 0x7F);
count += ((x & 0xFF00) == 0x7F00);
count += ((x & 0xFF0000) == 0x7F0000);
count += ((x & 0xFF000000) == 0x7F000000);
}
also gives some points. It'd probably be more if I could be bothered to break apart your assembly. :) |
|