|
|
|
|
|
by danking00
750 days ago
|
|
If I’m following, this presents an alternative format and implementation to LEB128 which encodes and decodes substantially faster. Notably, the implementation is quite simple. Cool! And agreed that modern CPUs really suffer from branches. Should I interpret the plot to mean the average elapsed wall clock time per integer decoded/encoded? And can I conclude the throughput is the reciprocal? So about 100,000 integers per second or around a 1 MB/s of decompressed data. I know this is a bit unfair because the implementation is much more complex, but my first thought is why I would use vu128 instead of Lemire’s Stream VByte: https://arxiv.org/abs/1709.08990 A slight tangent but I stumbled on this library which stores floats XOR’ed with the previous float in the stream: https://github.com/velvia/compressed-vec it seems really clever to me! They reference “Gorilla: A Fast, Scalable, In-Memory Time Series Database” which in turn references two 2006 papers: “Fast Lossless Compression of Scientific Floating-Point Data” and “Fast and Efficient Compression of Floating-Point Data”. Frustratingly, the FB paper doesn’t benchmark their XOR-based floating point encoding but the earlier two papers do. |
|
I believe a decoder than used SIMD would perform even faster since the format is friendly to the BMI instructions found on newer CPUs. You could also store the data as concat(prefix_bytes) + concat(payloads), which would really get things going.
For scientific data with lots of multi-byte f64 values, the single branch for < 0xF0 might be unnecessary and undesirable. There are other formats that can be decoded by completely branchless SIMD, at the cost of larger sizes for very small values. For example, Google has a format that uses the first byte as a bitmask: https://static.googleusercontent.com/media/research.google.c...
[0] vu128 is less affected by value size than VLQ/LEB128, so on fast hardware it decodes 10,000 8-bit values in about the same time as 10,000 64-bit values.
[1] In the discussion on lobste.rs[2] a user wondered what the results would look like with a distribution containing lots of small values. A zipf distribution's results looks like <https://i.imgur.com/WGFtz6f.png>.
[2] https://lobste.rs/s/qvoe7a/vu128_efficient_variable_length#c...