Hacker News new | ask | show | jobs
by mjburgess 1824 days ago
Is there anything you can say about Spark for Data Engineering (/ETL) ?

The most common reason for spark use today is ETL+DataLakes (ie., cloud object stores and ETL in/out).

It seems actual analysis is happening in fast databases that receive data from the object stores.

can anyone here comment on this paradigm?

1 comments

I don't have much insight into spark but I've been using Dataflow/beam for ETL. Been a pretty good experience. follows the style of spinning up compute to process as needed then shutdown.