Hacker News new | ask | show | jobs
by dragonwriter 1043 days ago
> Hadoop/Hive and Spark, which were the originators of SQL for ETL

They weren’t.

I guarantee you, before either of those existed, when Data Warehousing was often done with a different version/configuration of the same brand of RDBMS as the transactional store (the latter likely using something closer to a normalized schema, the former using a star or snowflake schema), using SQL for ETL was absolutely normal.

Which is why newer data warehousing / data lake systems support SQL even though they aren’t RDBMSs: a couple decades of RDBMS dominance made it the JavaScript of data storage.

> Because it’s not a general-purpose imperative loosely typed brittle language like Python.

Its not general-purpose or imperative, its just as much “loosely typed” as Python (both Python and SQL are strongly typed.)

Its not clear what concrete meaning “brittle” is supposed to have in this claim, so I can’t evaluate its accuracy.

1 comments

Definitely, I can jump into what we meant by brittle—we mainly meant that SQL scripts are hard to debug/undescriptive, you can't parametrize and customize error messages that you receive from transforms, and you can only execute one complete statement at a time that are often chained together with CTEs (which is a nightmare if its a statement of 400 lines of SQL). Python makes it easier to debug since we turn the approach from a declarative to a procedural one, and that's even the case with breakpoints when you write your actual transformers in Python.