That's really interesting to hear. I tried Duff's Device myself, once upon a time, and found it made no measurable difference, so I pulled it back out. This was a long time ago, so I had just assumed that a similar optimization was just built into most C compilers these days. Does it vary by toolchain? I believe I was using clang at the time.
(I was still quite wet behind the ears at the time, so it's also more than possible that I was just doing something wrong.)
All that said, agreed, wonderful little mechanism. It's one of those things that every C programmer should take the time to really understand.
Yes, modern compilers do loop unrolling nowadays, so Duff's device doesn't really have a use. But it's certainly a wonderful bit of history. I found Duff's device when I was researching coroutines in C.
No, this is just a wonderful bit of obsolete hacker history. Unless if you mess with retro-architecture for fun (like old PDP machines, be they emulated or real), then it's somewhat relevant.
(I was still quite wet behind the ears at the time, so it's also more than possible that I was just doing something wrong.)
All that said, agreed, wonderful little mechanism. It's one of those things that every C programmer should take the time to really understand.