Hacker News new | ask | show | jobs
by mythhouse 1246 days ago
Yes it does and its bleak. I don't understand the valuation based on assumption that even the smallest startup will need to write some fancy map-reduce spark jobs to do analytics and AI. Most companies are best served by a warehouse like snowflake and a realtime layer for analytics. I don't understand the value add of databricks.
2 comments

But you don't have to write map-reduce jobs at all? You can just write SQL queries or Pandas programs, and they automatically get parallelized by Databricks. Databricks is a data warehouse (just like Snowflake).

https://www.databricks.com/product/databricks-sql

In a twist, pandas programs don't get parallelized on Spark. Someone had to go and write a parallel layer that duplicated the pandas API, because otherwise you ended up with the entire pandas program executing on a single executor.
there is Pandas on Spark, included into Spark itself (originally Koalas) - the switch to it is very easy, and you get parallelization.
FWIW what we see are whole different categories of workloads. For primarily API-driven microservice workloads, ETL of data stores into Snowflake makes sense. But for primarily batch or stream workloads- implemented literally as batches or as data streams that have varying unit-of-work semantics, and where the target data model isn't read only analytics but read write operational- something like Spark can make a lot of sense.