Hacker News new | ask | show | jobs
by deepsun 2302 days ago
I can say for BigQuery and Databricks from personal experience.

BigQuery is much slower and is much more expensive for both storage and query.

Databricks (Spark) is even slower than that (both io and compute), although you can write custom code/use libs.

You seem to underestimate how heavily ClickHouse is optimized (e.g. compressed storage).

1 comments

> You seem to underestimate how heavily ClickHouse is optimized (e.g. compressed storage).

Is it any more compressed than Apache Hive's ORC format (https://orc.apache.org)? Because that's increasingly accepted as a storage format in a lot of these analytical systems.

Yes, looks like it. According to these posts, ORC only uses snappy or zlib compression, while Clickhouse uses double-delta, Gorilla, and T64 algorithms.

https://engineering.fb.com/core-data/even-faster-data-at-the...

https://www.altinity.com/blog/2019/7/new-encodings-to-improv...

ORC or Parquet are file storage formats so without context their performance can be almost anything. Where is the data stored? S3? HDFS? Local ram disk?

Clickhouse manages the whole distributed storage, ram caching, etc. thing for you.

In my experience, a unified single purpose vertically integrated solution will be faster than a bunch of kitchen sink solutions bolted together.