Hacker News new | ask | show | jobs
by derN3rd 1437 days ago
My company recently started to invest more into analytics and we had to find a good solution on which backend/database we want to decide. After some research we settled on ClickHouse and we couldn't be happier.

- Super easy to setup - Easy to backup - Fast configuration (documentation could be better at some parts) - Similar SQL dialect as our devs use in MySQL

Only negative points I could find so far: - No 'good' management GUI as e.g. phpMyAdmin, pgAdmin, Mysql Workbench - Caching/Batching layers not directly implemented, but through external software

As we are a fairly small company all other analytical databases would have cost us a large amount of money/time more. Friends of us recently hired a group of data engineers/analytics who also brought all their AWS knowledge and toolings with them, which basically brought them to the same outcome as us, while we only have 5% of their costs and all our devs are able to either ingest or query some data

EDIT: Does anyone have some recommendations on what GUIs I could give our PMs to work with ClickHouse instead of writing queries? All SaaS I found didn't support ClickHouse yet or would cost us a newborn. Also what tools do your devs use when they work with ClickHouse data?

4 comments

I like a lot of things about Clickhouse but one thing I'm afraid of is what happens when your data won't fit on a single machine. The replication and sharding seems pretty difficult and from reading the documentation feels like it might be pretty fragile.

I think once you reach that scale, systems that completely separate data and compute (like snowflake or trino+s3) are much less of a pain to run since even if you completely blow up your compute the data stays.

> Does anyone have some recommendations on what GUIs I could give our PMs to work with ClickHouse instead of writing queries? All SaaS I found didn't support ClickHouse yet or would cost us a newborn. Also what tools do your devs use when they work with ClickHouse data?

I think both Superset and Metabase were interesting choices - if you want to save money (at the expense of engineering time) you can self-host them.

> I like a lot of things about Clickhouse but one thing I'm afraid of is what happens when your data won't fit on a single machine.

ClickHouse sharding and replication is not that hard to master--it's simple and the parts are visible. If you don't want the headache of distributed system management, run it in Kubernetes or a managed service. ClickHouse-as-a-service is widely available from multiple vendors.

ClickHouse is also fast and cost-efficient at scale. It's a very good fit for multi-tenant SaaS analytics where you need fixed latency on responses to users.

Disclaimer: I work for Altinity, who run a cloud platform for ClickHouse.

Ebay wrote a great blog post about clustering clickhouse

https://tech.ebayinc.com/engineering/ou-online-analytical-pr...

--- The replication and sharding seems pretty difficult and from reading the documentation feels like it might be pretty fragile.

That's why trip.com replaced Clickhouse with StarRocks, a free (open source) OLAP database that handles sharding beautifully to give you linear scalability, and better query performance for single table or joined queries.

We built a Managed ClickHouse service to help exactly these difficulties with that technology. We are handling sharding, clustering, zookeeper, patching, updates without downtime, and Hybrid storage based on S3. https://double.cloud
How many Clickhouse as a service offerings exist now? I stopped counting at 7 a few months ago (double.cloud was not on my list).
Could you share a list of them?

  - Firebolt (Hard fork of clickhouse)
  - Altinity
  - Gigapipe
  - Hydrolix
  - Bytehouse.cloud
  - https://clickhouse.com/ ("coming soon")
  - TiDB (Their columnstore is a fork of clickhouse)
I stopped tracking after this. I saw a few press releases go by announcing a few others as well which I lost now.

The official Clickhouse Inc. is surely going to be under pressure to pull features out of their open source offering over time to differentiate themselves.

ClickHouse can run in a classic shared-nothing setup and in "cloud-native" setup with shared storage. Setting up a distributed system with hundreds of machines can be difficult... but it's actually not more difficult than for any other distributed system at this scale.
Just in case, here's built-in batching https://clickhouse.com/docs/en/operations/settings/settings/... and a list of mature UIs https://clickhouse.com/docs/en/connect-a-ui (that all have SaaS offerings I believe) and all third-party UIs https://clickhouse.com/docs/en/interfaces/third-party/gui
Thanks!

I remember trying the built-in batching but we had some trouble with it, so we just switched to <https://github.com/nikepan/clickhouse-bulk> which works without any issue since then.

Will have a look at the UIs listed there

Tableau is second to none for dataviz imo. Much more versatile than the competition. You can do interactive visualizations, set thresholds, color code/label data, etc. Of course it costs a lot, but if dataviz is something that is delivering real value the cost is more than worth it.
Clickhouse support for Tableplus landed 2 months ago. https://github.com/TablePlus/TablePlus/issues/670