Hacker News new | ask | show | jobs
by danielbln 2289 days ago
In our experience, on a significantly smaller scale, Clickhouse is vastly easier to operate compared to Druid, with all of its various components that all have various knobs and dials to configure and have to be orchestrated.
2 comments

Druid committer here. Fwiw, Druid was designed to run on huge clusters and that really shows up in the multi-process architecture. The idea is that if you separate the components needed for ingestion, historical processing, query routing, and coordination, then there are two benefits: they don't interfere with each other (spikes in ingestion load won't interfere with ability to query historical data), and also you can scale each one individually for your workload. You could even auto-scale some of them. For example, the original Druid cluster was operated with load-based auto-scaling for the ingestion processes.

That being said we are currently working on reducing the number of processes to 4 (from the current 6) for a "standard" setup. The main reason is that at smaller scale there isn't as much of a purpose to having a larger number of processes.

We're also working on removing some of the knobs. Actually, depending on what version you originally looked at, many of them might already be gone.

It's been a few years since we evaluated Druid. It's great to hear that you're simplifying things, especially for smaller setups!
Yes this is definitely also true.

Druid complexity is coming down a bit compared to where it started. These days you need brokers, middlemanagers and historicals - for queries, ingestion and storage respectively.

In the past to do batch ingestion it also required Hadoop but there is now a native parallel batch ingestion system that runs on the middlemanagers as worker tasks that can read from S3/GCS/existing Druid segments.

Druid is by far the more complex but you get a lot for it and with k8s it's not as hard to run/manage as it was in the past.