I think the blog post should point out very early that Onehouse is a Hudi company. There are some other recent benchmarks published in CIDR by Databricks that might paint a different picture: https://petereliaskraft.net/res/cidr_lakehouse.pdf
Thanks for the link. I'd be interested to see a perf comparison using a popular processing engine other than spark given the obvious potential for delta lake to be better tuned for spark workloads by default.
In Databricks published benchmark of course Delta is the fastest. I have also seen some Iceberg using company publishing benchmarks showing how Iceberg is the fastest.
I think vendor published benchmarks are fine if the dataset is open / accessible, the benchmark code is published, all software versions are disclosed, and the exact hardware is specified. I definitely wouldn't consider an audited TPC benchmark that's based on industry standard datasets / queries worthless in the data space. Disclosure: I work for Databricks.
It looks like the benchmarks used the latest versions of Delta and Iceberg, but chose a version of Hudi that is over 6 months old. Hudi v0.12.2 is more advanced than v0.12.0 which the benchmark did not consider. As the Databricks CIDR paper states, and as mentioned in the Onehouse article, Hudi by default is optimized for UPSERTs vs INSERTs and is a 1-line config change that is appropriate for a true apples-apples comparison. See both: https://www.onehouse.ai/blog/apache-hudi-vs-delta-lake-trans... and https://github.com/brooklyn-data/delta/pull/2
Hah, I could tell because in the "feature matrix" the Hudi column was mostly green compared to the others. Immediately made me suspicious so I looked it up and sure enough, not exactly an unbiased source.
Feature matrices are extremely easy to game depending on your choice of rows.
I recently evaluated these frameworks and went through all these links they have for each of those rows, on the first publish few months ago. FWIW I did not find any inaccuracies or wrong pointers.
it's funny how, on one hand you argue for objectivity but fundamentally distrust/write off a chance that someone could have created a hackernews account today and comment here - without a shred of evidence. May be now I am getting trained on the HN ways.