|
|
|
|
|
by loveparade
1043 days ago
|
|
I guess it's surprising then that both Hadoop/Hive and Spark, which were the originators of SQL for ETL, typically work on data lakes instead of RDBMSs. In fact, RDBMs support didn't come for a long time. The choice of SQL has nothing to do with RDBMs. It's because SQL is a declarative language that's easy to parse and convert into a physical query plan that can be parallelized and optimized extremely well. Why is that? Because it's not a general-purpose imperative loosely typed brittle language like Python. |
|
They weren’t.
I guarantee you, before either of those existed, when Data Warehousing was often done with a different version/configuration of the same brand of RDBMS as the transactional store (the latter likely using something closer to a normalized schema, the former using a star or snowflake schema), using SQL for ETL was absolutely normal.
Which is why newer data warehousing / data lake systems support SQL even though they aren’t RDBMSs: a couple decades of RDBMS dominance made it the JavaScript of data storage.
> Because it’s not a general-purpose imperative loosely typed brittle language like Python.
Its not general-purpose or imperative, its just as much “loosely typed” as Python (both Python and SQL are strongly typed.)
Its not clear what concrete meaning “brittle” is supposed to have in this claim, so I can’t evaluate its accuracy.