|
|
|
|
|
by slotrans
2061 days ago
|
|
I'm so confused. These examples are all using the SQL-like features of Spark. Not a map() or flatMap() in sight. So... why not just write SQL? df.registerTempTable('some_name')
new_df = spark.sql("""select ... from some_name ...""")
All of this F.col(...) and .alias(...) and .withColumn(...) nonsense is a million times harder to read than proper SQL. I just don't understand what any of this is intended to accomplish. |
|
At first glance, the F.col(), .withColumn() syntax isn't as intuitive as pure SQL, but it has a lot of advantages when you get used to it. You can make abstractions, use programming language features like loops, and use IDEs.
I find the PySpark syntax to be uglier than Scala. Lots of teams are terrified to use Scala and that's the reason PySpark is so popular.