I don't know if the OutOfMemory exception can still occur in recent versions of Spark, but the performance impact of groupByKey is very real.