Very cool! The effect on page layout sounds like it would be pretty similar to Oracle's hybrid columnar compression[1], but they claim average compression ratio is more like 10:1. Any idea what would make so much of a difference?
One guess from a super quick scan can be that we use type-specific compression algorithms. So if your table has one column of timestamps, another of floats, another int, another string, the database employs different compression algorithms (typically best-in-class) based on the column type.
Quick scan of the Oracle paper couldn't find specifics, other than something like this:
"Warehouse Compression provides two levels of compression: LOW and HIGH. Warehouse Compression HIGH typically provides a 10x reduction in storage, while Warehouse Compression LOW typically provides a 6x reduction"
That would at least suggest that they aren't doing anything type-specific like we are.
It may also be that we're operating in a slightly more delayed fashion (partially based on chunk boundaries), so we can organize across a lot larger range. For example, if you choose to segment by a device_id, it might scan 1M rows to assemble blocks/segments of device_ids, with each device_id having 1000 records to compress in a "mini column".
This also leads to significant query performance settings if you common filter by device_id, for example. Which are super common in time-series workloads for IT monitoring / devops / IOT / etc.
Quick scan of the Oracle paper couldn't find specifics, other than something like this:
"Warehouse Compression provides two levels of compression: LOW and HIGH. Warehouse Compression HIGH typically provides a 10x reduction in storage, while Warehouse Compression LOW typically provides a 6x reduction"
That would at least suggest that they aren't doing anything type-specific like we are.