Hacker News new | ask | show | jobs
by tandav 2067 days ago
I have opposite experience. After trying pyspark functional pipelines (so many handy functions) plain SQL seems so hard to read/understand. The main probem is that order of execution is not equal to order of code lines. https://i.stack.imgur.com/6YuwE.jpg

another thing is that python is so cool for data processing, and when working with plain sql I feel lack of

    .rdd.map(my_python_processing_function)
1 comments

Same for me. Python and Scala let users break up the logic into DataFrame transformations that can be unit tested, packaged into Wheel / JAR files, and easily reused in multiple contexts. Maintaining big, complex SQL codebases isn't easy.