Hacker News new | ask | show | jobs
by ramesh31 1398 days ago
I'm of the mind that Snowflake and Databricks are losing their value prop now that Delta Lake is open source and Iceberg is maturing. What's to stop me from rolling my own Spark clusters and just using one of those? Is anyone doing this?
2 comments

>What's to stop me from rolling my own Spark clusters and just using one of those? Is anyone doing this?

Ops. Unless your core competency is running reports and spark nodes, it's probably cheaper to outsource the management of Spark and friends than to hire people to make sure it's always up and running. To be fair I haven't touched Spark in many years but having to page someone who was good enough to spark to debug why a job stopped at 3am isn't fun.

>Ops. Unless your core competency is running reports and spark nodes, it's probably cheaper to outsource the management of Spark and friends than to hire people to make sure it's always up and running.

I think as an end user I would absolutely agree on this point. But many companies use Databricks as part of their automated backend systems that they resell to customers. The cost per "DBU" unit is astronomical for the amount of raw compute in use. It feels a bit like running a restaurant where you serve takeout.

[Disclaimer: Databricks employee] There's also a lot of value in DBSQL, Unity catalog (data management), and serverless for autoscaling that can all save money in terms of just running raw Spark. But if you want to operate Spark yourself, cool do it. We're happy for that, it builds the base of Spark committers over time and increases the quality of our products.
I can spin up and down 100+ node clusters on the 4 largest cloud providers at will.

What ops am I missing?

You'll find plenty of the customer base of Databricks used to run their own clusters.

It's a tradeoff. It might cost less dollars but more time. The time and expertise to run their own clusters effectively is not something every org can or desires to do.

And to get the very best price for those clusters your you'd need to commit to the CSP for three years!

Would love to know the TCO trade-off between procuring, securing and deploying on your own clusters vs having them managed via SaaS.