Hacker News new | ask | show | jobs
by nu11ptr 1495 days ago
I was thinking the same thing as I was reading this. I doubt you could retain 100% random access. Instead, I think you would generally create "blocks" that are compressed in a time range that is compatible with your application (ie. 1 day blocks or 1 week blocks, etc.) and then when picking date/times you take X blocks and then further refine the query after compression (example: load May 5th block --> May 5th 8:10AM - 11:13AM). At least, that is what I have done in the past in my own home grown apps. Essentially each block then starts and terminates compression - # of blocks is a trade off in compression efficiency vs. granularity.
1 comments

Correct, almost all timeseries databases divide the data in shards / partitions / whatever-it’s-called, which are then split by column, which are then compressed as a single unit.

Some databases use a fixed block size (eg as you mention, “1 day”), which are simple and stateless to manage, while others dynamically “split” blocks into smaller blocks (frequently called “ranges”), or merge them back later. The latter is significantly more complex, but is a much better approach for varying workloads where you don’t know the right shard size in advance, or need to deal with the possibility of highly varying workloads, eg you have a lot of traffic on specific time of day / day of week/month/year.