Hacker News new | ask | show | jobs
by ErikCorry 1645 days ago
I would think that you could use a hybrid approach where you have a table that is perhaps 9 or 10 bits wide and covers many of the more common codes, which will by definition be more common. Should be small enough to fit in the cache. Then do something slower for the very long codes. This way you avoid difficult branches most of the time.
1 comments

Exactly. Normally you use second-level tables for the long codes. In the first table, if code doesn't reach a leaf, tbl[code] holds a pointer to the next table to use.

For example, here's JPEG XL's Huffman decoder: https://github.com/libjxl/libjxl/blob/1c67f4eff0464e6241f365.... The branch uses a second table if the code was too long.