Hacker News new | ask | show | jobs
by mfreed 2429 days ago
It may also be that we're operating in a slightly more delayed fashion (partially based on chunk boundaries), so we can organize across a lot larger range. For example, if you choose to segment by a device_id, it might scan 1M rows to assemble blocks/segments of device_ids, with each device_id having 1000 records to compress in a "mini column".

This also leads to significant query performance settings if you common filter by device_id, for example. Which are super common in time-series workloads for IT monitoring / devops / IOT / etc.