Hacker News new | ask | show | jobs
by jvican 148 days ago
Is there any plan for this?
2 comments

Funny enough, I actually just (2 weeks ago) added support for streaming from Pyspark to Polars/DuckDB/etc through Arrow PyCapsule. By streaming, I mean actually streaming, not collecting all data at once. It won't be released probably until May/June but it's there: https://github.com/apache/spark/commit/ecf179c3485ba8bac72af...
Not that I’m aware of. The Spark ecosystem seems a little too “stable” to be putting effort into that kind of development.

Edit: hah, based on the sibling comment, I stand corrected