|
|
|
|
|
by thedood
687 days ago
|
|
This is a specialized ETL use-case - similar to taking a single SQL query and creating a dedicated distributed application tailored to run only that query. The lower-level primitives in Ray Core (tasks and actors) are general purpose enough to make building this type of application possible, but you'll be hard pressed to quickly (i.e., with less than 1 day of effort) make any arbitrary SQL query or dataframe operation run with better efficiency or scale on Ray than on dedicated data processing frameworks like Spark. IMO, the main value add of frameworks like Spark lies more in unlocking "good enough" efficiency and scale for almost any ETL job relatively quickly and easily, even if it may not run your ETL job optimally. |
|
Unclear if it's in the best interests of anyscale to promote Ray as a general purpose cluster productivity tool, even if it's good at that more general use case.