Hacker News new | ask | show | jobs
by Tanjreeve 1679 days ago
Delta lake is very much open. You can install delta lake and run it yourself. It's a transaction layer running over parquet files. You can go to the delta.io GitHub and install binaries yourself. Snowflake cannot be run independently of their cloud.

The rest of this is some vague claims of Databricks being unreliable techbros blah blah which is just emotionally charged hot air rather than being based on anything.

RE who to pick. Run them side by side. Use snowflake for non technical staff/BI load in prepared cuts of data. it's batteries included and less knobs to twiddle for optimisation. Databricks/spark has a learning code and isn't suitable for non-technical staff. But it gives a lot more options for processing for all the stuff that doesn't fit neatly into data clustering.

2 comments

Sort of. You can stop using Databricks service, and keep using Delta lake. But Databricks code is not open. Delta Lake is not equivalent to Databricks delta. The value prop is that customers, if they choose to not retain databricks service, can migrate off databricks and still use the open source version of delta lake, which again, is not as good as databricks delta.
Ok you've got me there it's not 100% the exact same code Databricks are using there are some optimisations (that normally do end up downstream anyway). But I think it's getting a bit philosophical to say it's not open when you can run a delta lake "on-prem" and shuffle data between databricks and your own setup with few/no changes. Now Databricks SQL product afaik is not open and that's a proprietary C++ engine comparable to Snowflake so I think these discussions might get a lot more confusing in the future when databricks doesn't just mean various flavours of spark.
Yes Photon is completely proprietary. Databricks does have a "delta" version, but it is actually completely baked into the databricks runtime. So we are both correct. Ali (Databricks CEO) actually has gone on record to say Databricks is 90% proprietary code. There is an open source version, but it is not as good. The culture within Databricks though, is completely open source. Unlike Snowflake, the culture is definitely not open source. I think it affects the culture too.
By learning code I mean Learning curve. You need to be able to code a little bit at a minimum to use Spark effectively even if a lot of the time you can just go with the SQL interface it isn't actually a SQL database under the surface so that can be a bit misleading if you dont know what's going on.