| HN Mirror

The spark.sql("""select ... from some_name ...""") bit is how to write pure SQL from the Scala or Python execution context. If you're in the SQL execution context, this syntax isn't required. I never write Spark code like this.

At first glance, the F.col(), .withColumn() syntax isn't as intuitive as pure SQL, but it has a lot of advantages when you get used to it. You can make abstractions, use programming language features like loops, and use IDEs.

I find the PySpark syntax to be uglier than Scala. Lots of teams are terrified to use Scala and that's the reason PySpark is so popular.