| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by andrewg 2429 days ago
	Very cool! The effect on page layout sounds like it would be pretty similar to Oracle's hybrid columnar compression[1], but they claim average compression ratio is more like 10:1. Any idea what would make so much of a difference? [1] https://www.oracle.com/technetwork/database/exadata/ehcc-twp...

1 comments

mfreed 2429 days ago

One guess from a super quick scan can be that we use type-specific compression algorithms. So if your table has one column of timestamps, another of floats, another int, another string, the database employs different compression algorithms (typically best-in-class) based on the column type.

Quick scan of the Oracle paper couldn't find specifics, other than something like this:

"Warehouse Compression provides two levels of compression: LOW and HIGH. Warehouse Compression HIGH typically provides a 10x reduction in storage, while Warehouse Compression LOW typically provides a 6x reduction"

That would at least suggest that they aren't doing anything type-specific like we are.

mfreed 2429 days ago

It may also be that we're operating in a slightly more delayed fashion (partially based on chunk boundaries), so we can organize across a lot larger range. For example, if you choose to segment by a device_id, it might scan 1M rows to assemble blocks/segments of device_ids, with each device_id having 1000 records to compress in a "mini column".

This also leads to significant query performance settings if you common filter by device_id, for example. Which are super common in time-series workloads for IT monitoring / devops / IOT / etc.