Hacker News new | ask | show | jobs
by andridk 1858 days ago
Sorry if this is obvious, but... What is ClickHouse?
2 comments

Officially, it's a column-oriented database. This means that, internally, it stores columns together rather than rows together. In practice, it means that it's optimized for calculating analytics over large datasets.

I've found, from personal experience, that it makes a good replacement for time-series databases, even though it's technically not a time-series database. My employer migrated our KPIs and other metrics from InfluxDB to ClickHouse a couple of years ago, and the drastic improvements in performance were well worth the time it took to migrate our data. It also helped that ClickHouse uses a subset of SQL, unlike InfluxDB which uses a superficially SQL-like but practically very different proprietary language.

You may be interested in https://github.com/influxdata/influxdb_iox

Column store db, using the DataFusion SQL engine. Persistence using parquet files directly on object stores (e.g. S3)

Still under development

Every high-volume Influx/Grafana implementation I’ve used has been a disaster. I’m now at a place that uses ClickHouse and I can now see the utility of Grafana
How so? I run a moderately sized InfluxDB stack (2x~2TB data) with Grafana, and it works pretty well.
The shape of the data matters. In particular the cardinality of the tags. If clickhouse works well for you, chances are that your use case will be well served by influxdb_iox too
Thanks! Very informative
from what I've read, it is not a column-oriented db. It's more like druid or pinot.
Clickhouse/Druid/Pinot are all columnstores/column-oriented databases. Clickhouse is a relational engine while Druid/Pinot are a different (and older) design using heavy indexing and pre-aggregation. All of them store table data as per-column segments though which is a defining feature leading to high compression and I/O performance.

There's also the badly named wide-column database type like Cassandra, but this is really just advanced or nested key/value rather than what people would consider "columns".

In a nutshell: like MySQL but for analytic applications