|
|
|
|
|
by wmertens
1014 days ago
|
|
> CONCLUSION > We introduced BtrBlocks, an open columnar compression format
for data lakes. By analyzing a collection of real-world datasets, we
selected a pool of fast encoding schemes for this use case. Additionally, we introduced Pseudodecimal Encoding, a novel compression scheme for floating-point numbers. Using our sample-based
compression scheme selection algorithm and our generic framework for cascading compression, we showed that, compared to
existing data lake formats, BtrBlocks achieves a high compression factor, competitive compression speed and superior decompression performance. BtrBlocks is open source and available at
https://github.com/maxi-k/btrblocks. |
|
I doubt I'd ever used columnar compression again as I felt it too difficult to fight DBAs on keeping the original sorting and schema preserved in an optimal way. I do find it really interesting though.