Hacker News new | ask | show | jobs
by fsociety 1117 days ago
Just install Apache Spark they said. It will be fun they said.

If you have the money, having a managed Spark instance with a bunch of added features can be a big win for some. There is a lot that goes into Spark maintenance.

1 comments

It also apparently includes some performance optimizations because they control both the hardware and software. And Delta Lake is pretty cool, and hosted MLFlow integration.
Databricks built a proprietary vectorized accelerator for Spark they call Photon. It's not just that they've tuned OSS Spark especially well.
Back when I was a customer (before Photon was released, also during) they had a very good tuning, in the order of around 2x faster for the workloads we had at the time (very large graph computation and a “simple” filtering)