|
|
|
|
|
by RBerenguel
2988 days ago
|
|
Being able to pass data through Arrow is a big improvement, but there's also a lot of serialisation going on you pay in Python. Also, if you want to do anything in the fancy areas (like, write your own optimisation rule for the SparkSQL optimiser) it's Scala. Even something simple as writing a custom aggregator is impossible in Python (at least it was in 2.2, haven't checked in 2.3 or "current" 2.4) |
|