|
|
|
|
|
by mlthoughts2018
2667 days ago
|
|
I agree that Spark may be a fine choice for ETL and generic pipeline tasks. But lots of companies will choose it as a data warehouse computation layer and then enforce a policy to standardize everything around it, including tasks like machine learning that are poorly suited for Spark. Worse, companies like Databricks will encourage this standardization and act like yes-man consultants, promising Spark ML offerings can solve all the problems, and you quickly end up with some brittle monster of a data warehouse system that is oriented to be convenient for Spark (which can’t effectively be used to solve the problems) and everything is deeply inconvenient to pipe to non-Spark systems, and nobody is sympathetic to any budgetary needs for other systems, since they spent it all on Spark. |
|