| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mythhouse 1246 days ago
	Yes it does and its bleak. I don't understand the valuation based on assumption that even the smallest startup will need to write some fancy map-reduce spark jobs to do analytics and AI. Most companies are best served by a warehouse like snowflake and a realtime layer for analytics. I don't understand the value add of databricks.

2 comments

solidangle 1245 days ago

But you don't have to write map-reduce jobs at all? You can just write SQL queries or Pandas programs, and they automatically get parallelized by Databricks. Databricks is a data warehouse (just like Snowflake).

https://www.databricks.com/product/databricks-sql

link

legerdemain 1245 days ago

In a twist, pandas programs don't get parallelized on Spark. Someone had to go and write a parallel layer that duplicated the pandas API, because otherwise you ended up with the entire pandas program executing on a single executor.

link

alexott 1237 days ago

there is Pandas on Spark, included into Spark itself (originally Koalas) - the switch to it is very easy, and you get parallelization.

link

jonahbenton 1245 days ago

FWIW what we see are whole different categories of workloads. For primarily API-driven microservice workloads, ETL of data stores into Snowflake makes sense. But for primarily batch or stream workloads- implemented literally as batches or as data streams that have varying unit-of-work semantics, and where the target data model isn't read only analytics but read write operational- something like Spark can make a lot of sense.

link