|
|
|
|
|
by jmillikin
750 days ago
|
|
The plot is the time to encode/decode a list of 10,000 random integers in a uniform distribution. The throughput is 2-10 GiB/s[0] on my desktop, with portable scalar operations (no SIMD). Distributions skewed toward small values are slightly faster[1]. I believe a decoder than used SIMD would perform even faster since the format is friendly to the BMI instructions found on newer CPUs. You could also store the data as concat(prefix_bytes) + concat(payloads), which would really get things going. For scientific data with lots of multi-byte f64 values, the single branch for < 0xF0 might be unnecessary and undesirable. There are other formats that can be decoded by completely branchless SIMD, at the cost of larger sizes for very small values. For example, Google has a format that uses the first byte as a bitmask: https://static.googleusercontent.com/media/research.google.c... [0] vu128 is less affected by value size than VLQ/LEB128, so on fast hardware it decodes 10,000 8-bit values in about the same time as 10,000 64-bit values. [1] In the discussion on lobste.rs[2] a user wondered what the results would look like with a distribution containing lots of small values. A zipf distribution's results looks like <https://i.imgur.com/WGFtz6f.png>. [2] https://lobste.rs/s/qvoe7a/vu128_efficient_variable_length#c... |
|
I never fully groked the Zipf distribution but am I correct to conclude the worst case for vu128 would be alternating integers on either side of the branch condition?
Thanks for the link to the slides, also clever!