| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by _dark_matter_ 1682 days ago

Thanks for the additional context here. As someone who works for a company that pays for both databricks and snowflake, I will say that these results don't surprise me.

Spark has always been infinitely configurable, in my experience. There are probably tens of thousands of possible configurations; everything from Java heap size to parquet block size.

Snowflake is the opposite: you can't even specify partitions! There is only clustering.

For a business, running snowflake is easy because engineers don't have to babysit it, and we like it because now we're free to work on more interesting problems. Everybody wins.

Unless those problems are DB optimization. Then snowflake can actually get in your way.

1 comments

rxin 1682 days ago

Totally. Simplicity is critical. That’s why we built Databricks SQL not based on Spark.

As a matter of fact, we took the extreme approach of not allowing customers (or ourselves) to set any of the known knobs. We want to force ourselves to build the best the system to run well out of the box and yet still beats data warehouses in price perf. The official result involved no tuning. It was partitioned by date, loaded data in, provisioned a Databricks SQL endpoint and that’s it. No additional knobs or settings. (As a matter of fact, Snowflakes own sample TPC-DS dataset has more tuning than the ones we did. They clustered by multiple columns specifically to optimize for the exact set of queries.)

link

geoduck14 1682 days ago

>That’s why we built Databricks SQL not based on Spark.

Wait... really? The sales folks I've been talking to didn't mention this. I assumed that when I ran SQL inside my Python, it was decomposed into Spark SQL with weird join problems (and other nuances I'm not fully familiar with).

Not that THAT would have changed my mind. But it would have changed the calculus of "who uses this tool at my company" and "who do I get on board with this thing"

Edit: To add, I've been a customer of Snowflake for years. I've been evaluating Databricks for 2 months, and put the POC on hold.

link

alexott 1682 days ago

it's different - rxin talks about this: https://databricks.com/product/databricks-sql

when you run Python, it's on Spark, although you now can use Photon engine that is used for DB SQL by default

link