Hacker News new | ask | show | jobs
by Epa095 1270 days ago
There is also pandas udfs, which uses arrow as the exchange format. I assume it still has to copy the data (?), but it makes the (de)serializarion fast, and allows for vectorized operations.

https://spark.apache.org/docs/3.0.0/sql-pyspark-pandas-with-...