Hacker News new | ask | show | jobs
by glogla 1440 days ago
I like a lot of things about Clickhouse but one thing I'm afraid of is what happens when your data won't fit on a single machine. The replication and sharding seems pretty difficult and from reading the documentation feels like it might be pretty fragile.

I think once you reach that scale, systems that completely separate data and compute (like snowflake or trino+s3) are much less of a pain to run since even if you completely blow up your compute the data stays.

> Does anyone have some recommendations on what GUIs I could give our PMs to work with ClickHouse instead of writing queries? All SaaS I found didn't support ClickHouse yet or would cost us a newborn. Also what tools do your devs use when they work with ClickHouse data?

I think both Superset and Metabase were interesting choices - if you want to save money (at the expense of engineering time) you can self-host them.

5 comments

> I like a lot of things about Clickhouse but one thing I'm afraid of is what happens when your data won't fit on a single machine.

ClickHouse sharding and replication is not that hard to master--it's simple and the parts are visible. If you don't want the headache of distributed system management, run it in Kubernetes or a managed service. ClickHouse-as-a-service is widely available from multiple vendors.

ClickHouse is also fast and cost-efficient at scale. It's a very good fit for multi-tenant SaaS analytics where you need fixed latency on responses to users.

Disclaimer: I work for Altinity, who run a cloud platform for ClickHouse.

Ebay wrote a great blog post about clustering clickhouse

https://tech.ebayinc.com/engineering/ou-online-analytical-pr...

--- The replication and sharding seems pretty difficult and from reading the documentation feels like it might be pretty fragile.

That's why trip.com replaced Clickhouse with StarRocks, a free (open source) OLAP database that handles sharding beautifully to give you linear scalability, and better query performance for single table or joined queries.

We built a Managed ClickHouse service to help exactly these difficulties with that technology. We are handling sharding, clustering, zookeeper, patching, updates without downtime, and Hybrid storage based on S3. https://double.cloud
How many Clickhouse as a service offerings exist now? I stopped counting at 7 a few months ago (double.cloud was not on my list).
Could you share a list of them?

  - Firebolt (Hard fork of clickhouse)
  - Altinity
  - Gigapipe
  - Hydrolix
  - Bytehouse.cloud
  - https://clickhouse.com/ ("coming soon")
  - TiDB (Their columnstore is a fork of clickhouse)
I stopped tracking after this. I saw a few press releases go by announcing a few others as well which I lost now.

The official Clickhouse Inc. is surely going to be under pressure to pull features out of their open source offering over time to differentiate themselves.

ClickHouse can run in a classic shared-nothing setup and in "cloud-native" setup with shared storage. Setting up a distributed system with hundreds of machines can be difficult... but it's actually not more difficult than for any other distributed system at this scale.