|
|
|
|
|
by _dark_matter_
1682 days ago
|
|
Thanks for the additional context here. As someone who works for a company that pays for both databricks and snowflake, I will say that these results don't surprise me. Spark has always been infinitely configurable, in my experience. There are probably tens of thousands of possible configurations; everything from Java heap size to parquet block size. Snowflake is the opposite: you can't even specify partitions! There is only clustering. For a business, running snowflake is easy because engineers don't have to babysit it, and we like it because now we're free to work on more interesting problems. Everybody wins. Unless those problems are DB optimization. Then snowflake can actually get in your way. |
|
As a matter of fact, we took the extreme approach of not allowing customers (or ourselves) to set any of the known knobs. We want to force ourselves to build the best the system to run well out of the box and yet still beats data warehouses in price perf. The official result involved no tuning. It was partitioned by date, loaded data in, provisioned a Databricks SQL endpoint and that’s it. No additional knobs or settings. (As a matter of fact, Snowflakes own sample TPC-DS dataset has more tuning than the ones we did. They clustered by multiple columns specifically to optimize for the exact set of queries.)