| fgiesen has a 3 part series of blog posts from 2018, "reading bits in far too many ways", focusing on the problem of reading data encoded a variable-bit-length code Part 3 explores a straightforward idea that can be used to improve performance of bitstream decoding - decode multiple streams at once. > if decoding from a single bitstream is too serial, then why not decode from multiple bitstreams at once? > So how many streams should you use? It depends. At this point, for anything that is even remotely performance-sensitive, I would recommend trying at least N=2 streams. Even if your decoder has a lot of other stuff going on (computations with the decoded values etc.), bitstream decoding tends to be serial enough that there’s many wasted cycles otherwise, even on something relatively narrow like a dual-issue in-order machine. > If you’re using multiple streams, you need to decide how these multiple streams get assembled into the final output bitstream. If you don’t have any particular reason to go with a fine-grained interleaving, the easiest and most straightforward option is to concatenate the sub-streams For delta coding, perhaps instead of resetting the delta encoding stream with an absolute value every X deltas, for a small value of X, this suggests we could use a much larger value of X, i.e. partition the input into coarse chunks and delta encode each, and then rewrite our decoder to decode a pair (or more) of the resulting streams at once. Unclear if that would help at all here, or if there's already enough independent work for the CPU to keep busy with. Part 1 https://fgiesen.wordpress.com/2018/02/19/reading-bits-in-far... Part 2 https://fgiesen.wordpress.com/2018/02/20/reading-bits-in-far... Part 3 https://fgiesen.wordpress.com/2018/09/27/reading-bits-in-far... |