|
|
|
|
|
by jamesblonde
2234 days ago
|
|
It's worse than that. Shuffle for Spark on Kubernetes is fundamentally broken and hasn't yet been fixed. The problem is that Docker containers cannot (for security reasons) share the same host-level disks. There is no external shuffle service, and disk-caching is container-local (not using kernel-level disk I/O buffering) which kills performance. Google's proposed soln below is to use NFS to store shuffle files, which is not going to be performant. Stick with YARN for Spark and only switch when shuffle is fixed for k8s. Databricks are in no rush to get shuffle fixed for k8s. References:
https://youtu.be/GbpMOaSlMJ4?t=1617
https://t.co/KWDNHjudfY?amp=1
https://issues.apache.org/jira/browse/SPARK-25299 |
|
We are currently trying to fix the first problem in a different context (not Spark), where worker containers store intermediate shuffle files in local disks mounted as hostPath volumes. The performance penalty is about 50% compared with running everything natively. Besides occasionally some containers almost get stuck for a long time. I believe that the Spark community will encounter the same problem in the future if they choose to use local disks for storing intermediate files.