| I often hear references to Apache Iceberg and Delta Lake as if they’re two peas in the Open Table Formats pod. Yet… Here’s the Apache Iceberg table format specification: https://iceberg.apache.org/spec/ As they like to say in patent law, anyone “skilled in the art” of database systems could use this to build and query Iceberg tables without too much difficulty. This is nominally the Delta Lake equivalent: https://github.com/delta-io/delta/blob/master/PROTOCOL.md I defy anyone to even scope out what level of effort would be required to fully implement the current spec, let alone what would be involved in keeping up to date as this beast evolves. Frankly, the Delta Lake spec reads like a reverse engineering of whatever implementation tradeoffs Databricks is making as they race to build out a lakehouse for every Fortune 1000 company burned by Hadoop (which is to say, most of them). My point is that I’ve yet to be convinced that buying into Delta Lake is actually buying into an open ecosystem. Would appreciate any reassurance on this front! Editing to append this GitHub history, which is unfortunately not reassuring: https://github.com/delta-io/delta/commits/master/PROTOCOL.md Random features and tweaks just popping up, PR’d by Databricks engineers and promptly approved by Databricks senior engineers… |