I really like Clickhouse. Discovered it recently, and man, it's such a breath of fresh air compared to suboptimal solutions I used for analytics. It's so fast and the CLI is also a joy to work with.
I always dismissed ClickHouse, because it's all super low level. Building a reliable system out of it, requires a lot of internal knowledge. This is the only DB I know, where you will have to deal with actual files on disk, in case of problems.
However, I managed to look besides that, and oh-my-god it is so fast. It's like the tool is optimized for raw speed and whatever you do with it is up for you.
Yeah ClickHouse does feel like adult LEGO to me too: it lets you design your data structures and data storage layout, but doesn't force you to implement everything else. If you work on a large enough scale that's exactly what you want from a system usually
Same here. I come from a strong Postgres and Microsoft SQL Server background and I was able to get up to speed with it, ingesting real data from text files, in an afternoon. I was really impressed with the docs as well as the performance of the software.
Having a SQL like syntax where everything feels like a normal DB helps a lot I think. Of course, it works very differently behind the scenes but not having to learn a bunch of new things just to use a new data model is a good approach.
I get why some create new dialects and languages as that way there is less ambiguity and therefore harder to use incorrectly but I think ClickHouse made the right tradeoffs here.
I remember a few years ago when the views on Clickhouse was it some "legacy" "bulky" and used by "the big guys" and not very much discussion or opinions of it in spaces like this. Seems like its come a long way.
Lots of Google analytics competitors appeared between 2017 and 2023 due to privacy reasons. And a lot of them started with normal Postgres or MySQL then switched to Clickhouse or simply started with Clickhouse knowing they could scale far better.
At least in terms of capability and reputation it was already well known by 2021 and certainly not legacy or bulky. At least on HN clickhouse is very often submitted and reached front page. Compared to MySQL when I tried multiple times no one is interested.
Edit: On another note Umami is finally supporting Clickhouse! [1], Not sure how they implementing it because it still requires Postgres. But it should hopefully be a lot more scalable.
Or may be Heavy duty? Although I remember a lot of people were sceptical of CH simply because it came from Yandex from Russia. And that was before the war.
Clickhouse earned that reputation. However, it was spun out of Yandex in 2021. That kickstarted a new wave of development and it’s gotten much better.
In my understanding DuckDB doesn't have its own optimised storage that can accept writes (in a sense that ClickHouse does, where it's native storage format gives you best performance), and instead relies on e.g. reading data from Parquet and other formats. That makes sense for an embedded analytics engine on top of existing files, but might be a problem if you wanted to use DuckDB e.g. for real-time analytics where the inserted data needs to be available for querying in a few seconds after it's been inserted. ClickHouse was designed for the latter use case, but at a cost of being a full-fledged standalone service by design. There are embedded versions of ClickHouse, but they are much bulkier and generally less ergonomic to use (although that's a personal preference)
Yeah, I don't really know. Though in the OLAP space that issue/discussion is really old. There's a good chance that performance is dramatically better now though YMMV.
However, I managed to look besides that, and oh-my-god it is so fast. It's like the tool is optimized for raw speed and whatever you do with it is up for you.