| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by glogla 1685 days ago

Delta lake is not meaningfully more "open" than whatever Snowflake (or BigQuery and Redshift) are doing. It does not require any less "moving data around"

With all these, the data sits on cloud storage and compute is done by cloud machines - the difference between Databricks and the others is that with Databricks, you can take a look at that bucket. But you're not going to be able to do much with that data without paying for Databricks compute, since the open source Delta library is not usable in real world.

Since commercial data warehouses are an enterprise product for enterprise companies (small companies can use stick with normal databases or SaaS and unicorns seem to roll their own with Presto/Trino, Iceberg, Spark and k8s, nowadays), the vendor and the product needs to be most of all reliable partner. And Databricks behavior does not inspire confidence of them being that.

If I'm outsourcing my analytical platform to a vendor, I want the to be almost boring. Not some growth hacking, guerilla marketing, sketchy benchmark posting techbros.

At the end of the day, anyone making years lasting million dollar decisions in this space should run their own evaluation. Our evaluation showed that there's a noticeable gap between what Databricks promises and what they deliver. I have not worked with Snowflake to compare.

1 comments

Tanjreeve 1684 days ago

Delta lake is very much open. You can install delta lake and run it yourself. It's a transaction layer running over parquet files. You can go to the delta.io GitHub and install binaries yourself. Snowflake cannot be run independently of their cloud.

The rest of this is some vague claims of Databricks being unreliable techbros blah blah which is just emotionally charged hot air rather than being based on anything.

RE who to pick. Run them side by side. Use snowflake for non technical staff/BI load in prepared cuts of data. it's batteries included and less knobs to twiddle for optimisation. Databricks/spark has a learning code and isn't suitable for non-technical staff. But it gives a lot more options for processing for all the stuff that doesn't fit neatly into data clustering.

link

imslowbutnice 1683 days ago

Sort of. You can stop using Databricks service, and keep using Delta lake. But Databricks code is not open. Delta Lake is not equivalent to Databricks delta. The value prop is that customers, if they choose to not retain databricks service, can migrate off databricks and still use the open source version of delta lake, which again, is not as good as databricks delta.

link

Tanjreeve 1683 days ago

Ok you've got me there it's not 100% the exact same code Databricks are using there are some optimisations (that normally do end up downstream anyway). But I think it's getting a bit philosophical to say it's not open when you can run a delta lake "on-prem" and shuffle data between databricks and your own setup with few/no changes. Now Databricks SQL product afaik is not open and that's a proprietary C++ engine comparable to Snowflake so I think these discussions might get a lot more confusing in the future when databricks doesn't just mean various flavours of spark.

link

imslowbutnice 1682 days ago

Yes Photon is completely proprietary. Databricks does have a "delta" version, but it is actually completely baked into the databricks runtime. So we are both correct. Ali (Databricks CEO) actually has gone on record to say Databricks is 90% proprietary code. There is an open source version, but it is not as good. The culture within Databricks though, is completely open source. Unlike Snowflake, the culture is definitely not open source. I think it affects the culture too.

link

Tanjreeve 1684 days ago

By learning code I mean Learning curve. You need to be able to code a little bit at a minimum to use Spark effectively even if a lot of the time you can just go with the SQL interface it isn't actually a SQL database under the surface so that can be a bit misleading if you dont know what's going on.

link