| Delta lake is not meaningfully more "open" than whatever Snowflake (or BigQuery and Redshift) are doing. It does not require any less "moving data around" With all these, the data sits on cloud storage and compute is done by cloud machines - the difference between Databricks and the others is that with Databricks, you can take a look at that bucket. But you're not going to be able to do much with that data without paying for Databricks compute, since the open source Delta library is not usable in real world. Since commercial data warehouses are an enterprise product for enterprise companies (small companies can use stick with normal databases or SaaS and unicorns seem to roll their own with Presto/Trino, Iceberg, Spark and k8s, nowadays), the vendor and the product needs to be most of all reliable partner. And Databricks behavior does not inspire confidence of them being that. If I'm outsourcing my analytical platform to a vendor, I want the to be almost boring. Not some growth hacking, guerilla marketing, sketchy benchmark posting techbros. At the end of the day, anyone making years lasting million dollar decisions in this space should run their own evaluation. Our evaluation showed that there's a noticeable gap between what Databricks promises and what they deliver. I have not worked with Snowflake to compare. |
The rest of this is some vague claims of Databricks being unreliable techbros blah blah which is just emotionally charged hot air rather than being based on anything.
RE who to pick. Run them side by side. Use snowflake for non technical staff/BI load in prepared cuts of data. it's batteries included and less knobs to twiddle for optimisation. Databricks/spark has a learning code and isn't suitable for non-technical staff. But it gives a lot more options for processing for all the stuff that doesn't fit neatly into data clustering.