Hacker News new | ask | show | jobs
by dominotw 1989 days ago
> best distributed ML system out there

I was comparing it for "traditional" data engineering stack that used spark for data munging, transformations ect.

I don't have much insight into ML systems or how spark fits there. Not all data teams are building 'ml systems' though. Parent comment wasn't referring to any 'ml systems', not sure why that would be automatically inferred when someone mentions data stack .

1 comments

Yeah, I suppose. I kinda think that distributed SQL is a mostly commoditised space, and wondered what replaced Spark for distributed training.

For context, I'm a DS who's spent far too much time not being able to run useful models because of hardware limitations, and a Spark cluster is incredibly good for that.

Additionally, I'd argue in favour of Spark even for ETL, as the ability to write (and test!) complicated SQL queries in R, Python and Scala was super, super transformative.

We don't really use Spark at my current place, and every time I write Snowflake (which is great, to be fair), I'm reminded of the inherent limitations of SQL and how wonderful Spark SQL was.

I'm weird though, to be fair.