|
|
|
|
|
by AdamProut
1679 days ago
|
|
I would say that TPC-DS and TPC-H are really table stakes benchmarks for data warehouses at this point in time (maybe they weren't 10 years ago). How to build a database that does well on them is well documented in the literature now[1][2][3][4] (maybe a few other papers). Its not easy to build such a database, but its "just" hard work and many companies have the $$ necessary to do that work. There isn't any magic or technical moat in the results for databricks (or snowflake, or redshift, etc.). I think Databricks is overly enthusiastic about their results as they have been trying to be competitive with cloud DWs on these benchmarks for a number of years now. They have finally caught up (by building deltalake and their photon query engine which implement a number of standard DW features). [1] http://www.vldb.org/pvldb/vol13/p1206-dreseler.pdf
[2] https://stratos.seas.harvard.edu/files/stratos/files/columnstoresfntdbs.pdf
[3] https://web.stanford.edu/class/cs245/readings/c- store.pdf
[4] http://sites.computer.org/debull/A12mar/vectorwise.pdf
|
|
The public pissing contest is entertaining while also being silly and slightly cringe, but I think it's a nice story for Databricks nonetheless. They now have a performant SQL-based analytics engine that can credibly compete with the best DWs in the market today, and it's just one part of their overall platform.
The sense I get is that Snowflake wants the conversation to be "no matter what you do, you need a data warehouse, and we're the best in the business at that." Databricks' Lakehouse approach is a fundamental challenge to that, and if they're getting this kind of performance from their analytics engine against the market-leading data warehouses today, that's a big momentum shift in their favour.