Hacker News new | ask | show | jobs
by powersnail 1716 days ago
> What bothered me about the original implementation was the lookup table. Even though I knew they’d be cached, I still thought the memory accesses might have a detrimental affect on performance.

The encoding lookup table is an array of four chars. I'd be surprised if accessing such an array has a detrimental effect on any program.

I also wonder why the graphs say "Encode / Decode", as if you are combining the performance of the encoding function and the decoding. Have you considered separating them?

It would also help reproducibility if you include your compiler, versions, and flags. You mentioned that you've turned off all optimizations, but I wonder why.

"O0" would certainly produces a lot of jumps for the switch. But O2 is certainly going to eliminate those jumps. In fact, with O2, gcc11 seems to produce identical code for switch and lookup table.

https://godbolt.org/z/jdx645MsY

1 comments

clang does something smart with the switch, equivalent to this C code:

  unsigned char lookup2_encode(const unsigned char dibit) {
    return ' \r\n\t' >> (dibit * 8);
  }
I'd expect that to be faster just from having the lookup table in the code itself, and not having to use a different cache line from the data segment (as well as avoiding pointer indirection).