|
I've heard wonderful things about ClickHouse, but every time I try to use it, I get stuck on "how do I get data into it reliably". I search around, and inevitably end up with "by combining clickhouse and Kafka", at which point my desire to keep going drops to zero. Are there any setups for reliable data ingestion into Clickhouse that don't involve spinning up Kafka & Zookeeper? |
Vector is a relatively simple ingest tool that supports lots of sources and sinks. It's very simple to run — just a config file and a single binary, and you're set. But it can do a fair amount of ETL (e.g. enriching or reshaping JSON), including some more advanced pipeline operators like joining multiple streams into one. It's maybe not as advanced as some ETL tools, but it covers a lot of ground.
Since you mention Kafka, I would also mention Redpanda, which is Kafka-compatible, but much easier to run. No Java, no ZooKeeper. I think you'd still want Vector here, with Vector connecting Redpanda to ClickHouse. Then you don't need the buffering that Vector provides, and Vector would only act as the "router" than pulls from Redpanda and ingests into ClickHouse.
Another option is RudderStack, which we also use for other purposes. It's a richer tool with a full UI for setting up pipelines, and so on.