Hacker News new | ask | show | jobs
by MrPowers 2179 days ago
Koalas is the Pandas API on top of Apache Spark for anyone that's interested: https://github.com/databricks/koalas

It works similar to PySpark and is scalable to massive datasets (hundreds of terabytes). Koalas is probably the best bet if you're working on a massive dataset and want the Pandas API. Or you can simply use PySpark which has a cleaner interface.