|
|
|
|
|
by dragontamer
801 days ago
|
|
I'm not exceptionally good with compression, but I did a few experiments. I've noticed that array-of-structs form is harder to compress than struct-of-arrays form. This is relevant because a floating-point number is really a struct containing 3 values: 1-bit sign, 8-bit exponent, and 23-bit mantissa. -------- If you have 1000 floats, I would expect 1000-sign bits in order to be more compressible (whatever compression algorithm you use, it would better determine the pattern of signs). Then 8000-bits of exponent would also be more compressible, because all the exponents would likely follow its own pattern. Finally, the 23-bits of mantissa would probably be difficult to impossible for compression. But "extracting it out" before running a compression algorithm will give the sign + exponents more opportunity for comrpession. -------- So yeah, just think about your data. And favor structure-of-arrays form for better compression. |
|
It’s worth noting that the algorithm in the article, and it’s cousin, delta encoding, both do already take advantage of any cases where the sign & exponent bits don’t change with every sample, and/or where the mantissa ULPs are constant for a while at a time. It’d be interesting to compare the explicit SoA of floats compression idea to XOR or delta compression, but I think it’s also interesting in a meta kinda of way that these two seemingly different ideas have similar outcomes.