| You changed my perspective a little bit by asking the right questions. > Moves from an architecture that is clustered for scale (ie. spark) to one that only scales vertically I did a quick estimate of the volume, and we won't reach 1Tb before > 5 years. We're not in a line of business where the number of clients can increase dramatically so it's fairly predictable. I don't want to design for imaginary scaling issues. > Potentially introduces yet more sources of truth for some data. It is more intended to replace the current mess. > SQL is terrible language to write transformations in (its a query language, not an ETL pipeline) Actually this is the point that concerns me the most. The need to transform the data in non-trivial ways. But surely people didn't wait for Spark to do this? > Unless you can very clearly demonstrate that what you're making is meaningfully better This is a very good point, and I think I should come up with a quick POC to demonstrate and get buy-in. > Could you perhaps find better way to orchestrate your spark tasks, eg. with airflow or ADF or AWS Glue or whatever? I feel that it would just be solving the mess by adding more mess. |
But I agreed with the parent comment's author about pretty much anything until the third bullet point of the second list. I'd like to get more reasoning behind his SQL hate.