Hacker News new | ask | show | jobs
by ninkendo 726 days ago
The idea is to minimize the number of “am I done?” checks. If you have to copy 100 bytes, a naive approach would be to just do `while (cur < last) { dst[cur] = src[cur++]; }`, but that checks cur < last every single byte.

So you’d rather do something like

    while (cur < last) {
        dest[cur] = src[cur++];
        dest[cur] = src[cur++];
        dest[cur] = src[cur++];
        dest[cur] = src[cur++];
        dest[cur] = src[cur++];
        dest[cur] = src[cur++];
        dest[cur] = src[cur++];
        dest[cur] = src[cur++];
    }
To make it so you copy multiple bytes per check. But that only works if you have an exact multiple of 8 (in this case) bytes to copy.

The cleverness of duff’s device is in being able to write essentially the above “unrolled” loop, but with an embedded switch statement that skips the correct number of byte copies when the remaining bytes are not a multiple of 8. But you only need to do that once… after you skip the right number of bytes (via the embedded switch statement) the number of remaining bytes is a multiple of 8 and now you’re just doing a normal loop.