Hacker News new | ask | show | jobs
by alexott 2179 days ago
It would be interesting to see koalas compared as well
1 comments

Koalas is the Pandas API on top of Apache Spark for anyone that's interested: https://github.com/databricks/koalas

It works similar to PySpark and is scalable to massive datasets (hundreds of terabytes). Koalas is probably the best bet if you're working on a massive dataset and want the Pandas API. Or you can simply use PySpark which has a cleaner interface.