Hacker News new | ask | show | jobs
by notamy 604 days ago
Clickhouse is great stuff. I use it for OLAP with a modest database (~600mil rows, ~300GB before compression) and it handles everything I throw at it without issues. I'm hopeful this new JSON data type will be better at a use-case that I currently solve with nested tuples.
2 comments

Similar for us except 700mil rows in one table, 2.5 billion total rows. That's growing quickly because we started shoving OTEL to the cluster. None of our queries seem to phase Clickhouse. It's like magic. The 48 cores per node also helps
Postgres should be good enough for 300GB, no?
I had a postgres database where the main index (160gb) was larger than the entire equivalent clickhouse database (60gb). And between the partitioning and the natural keys, the primary key index in clickhouse was about 20k per partition * ~ 1k partitions.

Now, it wasn't a good schema to start with, and there was about a factor of 3 or 4 size that could be pulled out, but clickhouse was a factor of 20 better for on disk size for what we were doing.

At least in my experience, that's about when regular DBMS:es kinda start to suck for ad-hoc queries. You can push them a bit farther for non-analytical usecases if you're really careful and have prepared indexes that assist every query you make, but that's rarely a luxury you have in OLAP-land.
It depends, if you want to do any kind of aggregation, counts, or count distinct pg falls over pretty quickly.
Probably, but Clickhouse has been zero-maintenance for me + my dataset is growing at 100~200GB/month. Having the Clickhouse automatic compression makes me worry a lot less about disk space.
For write heavy workloads I find psql to be a dog tbh. I use it everywhere but am anxious to try new tools.

For truly big data (terabytes per month) we rely on BigQuery. For smaller data that is more OLTP write heavy we are using psql… but I think there is room in the middle.

Yes, but you're starting to get to the size where you need some real PG expertise to keep the wheels on. If your data is growing CH will just work out of box for a lot longer.