Hacker News new | ask | show | jobs
by missosoup 2428 days ago
What does spark win at exactly?

Dask+Perfect is a much better experience all round including perf, with virtually none of the cluster management hell involved.

3 comments

Could you talk about Prefect ? we are in the process of moving from Spark to Dask. I have never heard of prefect. what do you use it for ?
tl;dr we use it for a similar set of tasks that one would use Airflow for.

Unlike Airflow, this lends itself to microbatching and streaming. Plus a bunch of housekeeping items ticked off that Airflow never got around to. With a bit of devops engineering time, you can have perfect manage the size of your worker cluster on k8s and scale it up/down with ingest demand, etc.

I'll say one thing though. The Perfect website used to be a lot more technical and explicit about what it is and isn't. Now it's mostly sales gobbledegook. Maybe not a good sign. I've seen this happen before with dremio.

This is super interesting!

Do you run dask on k8s ? I have been concerned that dask does not leverage kubernetes HPA for autoscaling...but instead chooses to run an external scheduler.

How has your experience been ?

Very interesting. Can't find references to "Perfect", though; could you please point to a link?
https://www.prefect.io

Not the most SEO-friendly choice of name. Great product though.

Are you using their cloud product? The core/open source product doesn't have a way to persist schedule data.
looks like dask is python-only, so it's a nonstarter (loser) for already existing JVM code that runs on spark
Spark stacks inevitably end up with PySpark though. It's rework for people who already committed to Spark, sure. And for bigger projects that committed to Spark this change isn't justifiable. But for a greenfield project, choosing Spark is just silly today.