Hacker News new | ask | show | jobs
by grumpydba 2418 days ago
> Anyone considering a “time series database” should first set up a modern commercial column store, partition their tables on the time column, and time their workload. For any scan-oriented workload, it will crush a row store like Timescale.

Or you can set up a clickhouse instance. It's a seriously promising and underrated product.

2 comments

Clickhouse is a distributed relational columnar database. It competes with MemSQL, Vertica, Actian, Greenplum, and hosted options like Redshift, Bigquery, Snowflake, etc.
I assume they were pointing out you don't need to go to a commercial offering.
Clickhouse is good, but it's definitely made for a very limited purpose; it's not a general purpose SQL database. Which is fine, but the attraction with something like TimescaleDB is that your time series data can coexist with normal data.
Many Time-Series applications do not need super complicated SQL. This is why there are many timeseries focused databases even without SQL support.

There is also PostgreSQL Foreign Data Wrapper for ClickHouse which allows you to run all SQL PostgreSQL support and often with great performance

If you use Postgresql's query engine on the Clickhouse data, you lose all the benefits of the columnar query engine of Clickhouse so that's not correct.
No you don't lose them. Fdw supports push down of where clauses, only selects the required columns. You can also create views in clickhouse to make sur the joins are processed there.
You're right but if the syntax that you're using is not supported in Clickhouse, aggregate and predicate pushdowns won't work and this FDW (https://github.com/adjust/clickhouse_fdw) needs to map all the Postgresql functions / produces to Clickhouse in order to take advantage of push-down so the only use-case here is that you may want to join the data in Clickhouse with the data in Postgresql (or other FDW sources).
I don't know if I'd say very limited, I've used it to do a lot of standard SQL stuff where my workload wasn't time series at all but was aggregations and analytical workloads.