Hacker News new | ask | show | jobs
by dastbe 3824 days ago
databricks has a page that describes the pitfalls: https://databricks.gitbooks.io/databricks-spark-knowledge-ba...

I don't know if the OutOfMemory exception can still occur in recent versions of Spark, but the performance impact of groupByKey is very real.