| We haven’t benchmarked FastLanes directly against LanceDB yet, but here’s a quick look at the compression side: LanceDB supports: FSST Bit-packing Delta encoding Opaque block codecs: GZIP, LZ4, Snappy, ZLIB So in that regard, it’s quite similar to Parquet — a mix of lightweight codecs and general-purpose block compression. FastLanes, on the other hand, introduces Expression Encoding — a unified compression model that allows combining lightweight encodings to achieve better compression ratios. It also integrates multiple research efforts from CWI into a single file format: The FastLanes Compression Layout: Decoding >100 Billion Integers per Second with Scalar Code (VLDB '23)
PDF: https://dl.acm.org/doi/pdf/10.14778/3598581.3598587 ALP (Adaptive Lossless Floating-Point Compression) — SIGMOD '24
https://ir.cwi.nl/pub/33334/33334.pdf G‑ALP (GPU-parallel variant of ALP) — DaMoN '25
https://azimafroozeh.org/assets/papers/g-alp.pdf White-box Compression (self-describing, function-based) — CIDR '20
https://www.cidrdb.org/cidr2020/papers/p4-ghita-cidr20.pdf CCC (Exploiting Column Correlations for
Compression) — MSc Thesis '23
https://homepages.cwi.nl/~boncz/msc/2023-ThomasGlas.pdf |
Our current approach is pretty similar to Parquet for scalar types. We allow a mix of general and lightweight codecs for small types and require lightweight only codecs for larger types (string, binary).
Nice work on the paper :)