Hacker News new | ask | show | jobs
by mritchie712 1403 days ago
I predict[0] we'll see more people choosing Clickhouse over Snowflake in the next 5 years. Clickhouse will get reasonably feature compatible with Snowflake and give people a better escape hatch if they want to self-host their data stack. Clickhouse, Inc is building a cloud product that abstracts away the complexity and there's already companies like Altinity that will spin up a cluster for you in minutes.

0 - https://blog.luabase.com/clickhouse-for-data-nerds/

2 comments

Isn't Clickhouse a hosted SQL DBMS? Not really comparable to a cloud data lake.

Snowflake/Databricks scales infinitely across cloud object stores like S3. Clickhouse is run as a single (or sharded) process that uses the local file system like any other SQL database, and requires volume provisioning as your data scales. It also has a fixed run cost (EC2 or wherever it's hosted) versus an "on-demand" model where read clusters are spun up to run queries against static objects that have no fixed cost other than storage pricing.

ClickHouse can access non-local storage without issue (or at least, with only issues for some of them - HDFS and S3 seem to work fine, I've had less luck with real-time Kafka). I'm not sure how well it scales horizontally for such uses; you can hack something up with macros that isn't too painful but there may also be better options.

However, it's probably not a great pick if you're already struggling with the operations side of things, which seems to be the main selling point for services like Snowflake.

ClickHouse only has fixed run cost if you configure it that way. We run ClickHouse clusters in AWS / GCS using block storage in our cloud platform. You can scale VMs up and down vertically in minutes, and scale horizontally in the same amount of time. The model works great for SaaS use cases that require constant response at all times and scale over days or weeks rather than minutes. Real-time analytic apps that show tenant dashboards or generate recommendations for users on ecommerce sites have this characteristic.

I don't think there's really a right or wrong answer here, just trade-offs.

Disclaimer: I work on Altinity.Cloud, a platform for managed ClickHouse

In which way not comparable?
From the article: "JOIN's are also not nearly as performant as in other cloud data warehouses." This seems like a pretty significant limitation.
That's... literally comparing them. The comparison for some use cases might not be favorable for ClickHouse, but they're comparable.

(IMO the slowness of ClickHouse joins has been overstated, especially since its many-column table support is so good you'll probably be fine joining on insert instead.)

Yes, this is one major hurdle they need to overcome, but I think they'll (Clickhouse Inc + the community) pull it off. It's a current weakness but by no means unsolvable.
Clickhouse is incredible software. It only feels a little foreign when coming from Postgres (e.g. some CamelCase terms).
Yeah, the CamelCase throws me too, especially since it's mixed in with snake_case (e.g. date_trunc[0])

0 - https://clickhouse.com/docs/en/sql-reference/functions/date-...

camelCase - native functions

SQL_STYLE_CASE - compatible functions