|
|
|
|
|
by MrPowers
2179 days ago
|
|
Koalas is the Pandas API on top of Apache Spark for anyone that's interested: https://github.com/databricks/koalas It works similar to PySpark and is scalable to massive datasets (hundreds of terabytes). Koalas is probably the best bet if you're working on a massive dataset and want the Pandas API. Or you can simply use PySpark which has a cleaner interface. |
|