| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mfreed 864 days ago
	Check out how TimescaleDB adds columnar compression to PostgreSQL, typically saving 95% of storage overhead: https://www.timescale.com/blog/building-columnar-compression...

2 comments

tbragin 864 days ago

However if you really want to optimize data currently residing in Postgres for analytical workloads, as the original comment suggests - consider moving to a dedicated OLAP DB like ClickHouse.

See results from Gitlab benchmarking ClickHouse vs TimescaleDB: https://gitlab.com/gitlab-org/incubation-engineering/apm/apm...

Key findings:

* ClickHouse has a much smaller data volume footprint in all cases by almost a factor of 10.

* There are very few ClickHouse queries that have >1s latency at q95. TimescaleDB has multiple >1s latencies, including a few in the range of 15-25s.

Disclaimer: I work at ClickHouse

link

osigurdson 864 days ago

What we ended up doing is maintain meta-data in Postgres but time series data is stored in ClickHouse. Thanks for making / working on ClickHouse. I appreciate it very much.

link

mfreed 864 days ago

That PoC benchmark didn't turn on Timescale's columnar compression, which every real deployment uses. So misleading at best.

(Timescaler)

link

tbragin 864 days ago

Compression helped, but not enough in our experiments https://benchmark.clickhouse.com/#eyJzeXN0ZW0iOnsiQWxsb3lEQi...

link

mfreed 864 days ago

All depends on the benchmark :)

https://www.timescale.com/blog/what-is-clickhouse-how-does-i...

link

boulos 864 days ago

Do you have a version using the same instance type? Seems weird to mix them.

link

tbragin 864 days ago

Indeed, ClickHouse results were run on an older instance type of the same family and size (c5.4xlarge for ClickHouse and c6a.4xlarge for Timescale), so if anything ClickHouse results are at a slight disadvantage.

This is an open source benchmark - we'd love contributions from Timescale enthusiasts if we missed something: https://github.com/ClickHouse/ClickBench/

link

boulos 864 days ago

Eh, c6a is also an AMD Rome which has worse memory bandwidth at the tails and weaker per thread performance than Cascadelake (c5). I don't understand anything about this particular benchmark, but I wouldn't compare them simply as "older vs newer".

link

osigurdson 864 days ago

TimeScale was certainly the first choice as we were already using Postgres. However, we could not get it to perform well as times are simulated / non monotonic. We also ultimately need to be able to manage low trillions of points in the long run. InfluxDB was also evaluated but faced a number of issues as well (though I am certain both it and TimeScale would work fine for some use cases).

I think perhaps because ClickHouse is a little more general purpose, it was easier to map our use case to it. Also, one thing I appreciate about ClickHouse is it doesn't feel like a black box - once you understand the data model it is very easy to reason about what will work and what will not.

link

out_of_protocol 864 days ago

Did you look at something Parquet-based? Different approach, could work on very large time-series-like datasets. E.g. snowflake, Apache Iceberg

link