| I've worked on a few SQL systems used for analytics and ETL. My users fell into (for the purposes of this discussion) three categories: 1. Analysts who prefer sheets 2. Data scientists that prefer pandas 3. Engineers who prefer C++/Java/JavaScript/Python I'm fairly sure SQL isn't the first choice for any of them, but in all three cases a modern vectorized SQL engine will be the fastest option for expressing and executing many analysis and ETL tasks, especially when the datasets don't fit on a single machine. It's also easier to provide a shared pool of compute to run SQL than arbitrary code, especially with low latency. Even as a query engine developer, I would prefer using a SQL engine. Performing even the basic optimizations a modern engine would perform -- columnar execution, predicate pushdown, pre-aggregation for shuffles, etc -- would be at least a week of work for me. A bit less if I built up a large library to assist. |