Hacker News new | ask | show | jobs
by thomoco 1249 days ago
Could you please elaborate on your comments and possible misconceptions about ClickHouse? Proven stability, massive scale, predictability, native SQL, and industry-best performance are all well-recognized characteristics of clickhouse, so your comments here seem a bit biased.

I am interested to learn more about your point of view, as well as tangentially the strategic vision of MotherDuck as a company.

(VP Support at ClickHouse)

1 comments

Speaking from nearly a decade working on BigQuery, and a year working at Firebolt.

- Stability. It OOMS, your CTO mentioned that last week.

- It is not correct. I believe your team is aware of cases in which your very own benchmarks revealed Clickhouse to be incorrect.

- Scale. The distributed plan is broken and I'm not sure Clickhouse even has shuffle.

- SQL. It is very non-standard.

- Knobs. Lots of knobs that are poorly documented. It's unclear which are mandatory. You have to restart for most.

Don't get me wrong, I love open source, and I love what Clickhouse has done. I am not a fan of overselling. There are problems with Clickhouse. Trying to sell it as a superset of the modern CDW is not doing users any favors.

As an engineer who admires the work done by DuckDB, I'm disappointed that the co-founder of its evolution is spreading FUD about competitors before its even in the competitive conversation.

> Stability. It OOMS, your CTO mentioned that last week.

I ran ClickHouse clusters for years with zero stability issues (even as a beginner at the time) at an extremely large volume video game studio with real-time needs. Using online materialized views, I was able to construct rollups of vital KPIs at millisecond level while maintaining multi-thousand QPS. Stability was never a concern of ours, and quite frankly, we were kind of blown away.

> Scale. The distributed plan is broken and I'm not sure Clickhouse even has shuffle.

First, I hate the word "broken" with zero explanation what you mean by this. Based on your language, I'm assuming you're just suggesting the distributed plans aren't as efficient as possible, a limitation that the engineers are not shy to admit.

> SQL. It is very non-standard.

I would argue the language is more a superset than "non-standard". Most everything for us just worked, and often I found areas of SQL that I could reduce significantly due to the "non-standard" extras they've added. For example: Did you know they have built-in aggregate functions for computing retention?!

> Knobs. Lots of knobs that are poorly documented. It's unclear which are mandatory. You have to restart for most.

Yes, there are a lot of knobs. ClickHouse works wonderfully out of the box with the default knobs, but you're free to tinker because that's how flexible the technology is.

You worked at Google for over a decade? You should know. Google's tech is notorious for having a TON of knobs for their internal technology (e.g. BigTable). Just because the knobs are there doesn't mean they must be tuned, it just means the engineers thought ahead. Also, the vast majority of configuration changes I've made never required a restart...I'm not even sure why you pointed this out.

(Disclaimer: I have been using ClickHouse successfully for several years)

Funny to read this since based on my testing so far DuckDB is not quite rockstable yet, requires preloading the httpfs module and setting 5 parameters each tun just to query a file off S3 (often very slowly, mostly because of the folder crawling/list logic?) with non standard SQL for extensions and segfaults frequently during testing if the remote server has any unexpected response (also not working too well with S3 clones). The WASM version differs in s3 settings, too but this is not well documented. If you really love opensource and with such a long career you should know there is no point in badmouthing other projects in order to achieve success, particularly from leadership positions. Just make DuckDB great and let happy users speak on your behalf. Nobody wants to be part of a toxic community this early in.

I do NOT work for ClickHouse, but I've been running super stable distributed CH clusters for years.

Funny noticing this same post pattern by Mother Duck in other threads. It's not a good look.
It is quite surreal that in addition to the usual Databricks astroturfing and various fans arguing, we now also have founders and VPs of companies arguing about who is better on HN.