> I'd hope so if they're selling a product based on FOSS code, there has to be a value add to justify it
They have some proprietary features like DBIO [1]. They also have some cloud-specific features like storage autoscaling [2] that would not be available in OSS Spark. Even Delta Lake [3] used to be proprietary, but I suspect the rise of open-source frameworks like Iceberg led them to open-source it.
Shameless plug - when working at a since-shutdown competitor to Databricks, I'd come up with storage autoscaling long before them [4], so it's not unlikely that they were "inspired" by us :-) .
The open source Delta is not a replacement for the real thing - they did not include features like optimizing small files (small file problem is well known in big data, and much more of a problem once streaming gets involved) and others. It is more of a demo of the real thing. Which does not stop them from repeating everywhere how open they are, of course.
EDIT: the delta also still keeps partitioning information in the hive metastore, while iceberg keeps it in storage, making it a far superior design. Adopting iceberg is harder due to third party tools like AWS Redshift not supporting it - you have to go 100 % of the way.
EDIT: the delta also still keeps partitioning information in the hive metastore, while iceberg keeps it in storage, making it a far superior design. Adopting iceberg is harder due to third party tools like AWS Redshift not supporting it - you have to go 100 % of the way.