Hacker News new | ask | show | jobs
by AdamProut 972 days ago
Databricks has kept their Photon[1][2] query engine for Spark closed sourced thus far. Unless EMR has made equivalent changes to the Spark runtime they use Databricks should be much faster. Photon brings the standard vectorized execution techniques used in SQL data warehouses for many years to Spark.

[1] https://docs.databricks.com/en/clusters/photon.html [2] https://dl.acm.org/doi/10.1145/3514221.3526054

1 comments

I am a bit hazy about the exact details of how we did it since its been some time, but we definitely did not use Photon as it was too expensive.

One of the issues was that we started experimenting with Delta Tables and EMR was horrible in leveraging that.