Hacker News new | ask | show | jobs
by buremba 2792 days ago
Use TimescaleDB if you have time-series data, if you want to scale out your OLTP workload then Citus is what you're looking for.
1 comments

Citus is also used for large time-series / analytics use cases e.g. https://www.citusdata.com/customers/heap

There's a question of what you actually want to do with the time-series data. If you don't expect to have much data or just want to store it and maybe occassionally query it, then a single server with partitioning (e.g. through pg_partman, Timescale) might be enough. If you want to build an analytical dashboard that needs to remain fast even if you're dealing with many users and terabytes of data per day, then you probably need Citus.

Citus can load, aggregate and query the data in parallel using all the cores in the cluster. It also supports Postgres' native partitioning and pg_partman: https://www.citusdata.com/blog/2018/01/24/citus-and-pg-partm...

AFAIK Heap uses Citus but also has an internal partitioning scheduler for their customer event data so I don't think that they're a good example. Timescale doesn't support scaling out yet but it's in their roadmap so let's wait for them to implement for a fair conclusion.

If you're going to create roll-up tables and power your dashboard using those tables, you're fine with both options IMO. Cloudflare was also using Citus exactly for this use-case before they switched to Clickhouse.

If you have ad-hoc use-cases for time-series data, Timescale might be a better option because it's built exactly for this use-case and it knows the semantics of the data so it can partition the data in an optimized way and perform some optimizations such as parallelized operations and re-sizing chunks. In that sense, it's comparable to Influxdb, not Citus.