Hacker News new | ask | show | jobs
by rxin 4276 days ago
Thanks for sharing this. I'm the author of this blog post. Free free to ask me anything.
3 comments

Your post mentions "single root IO virtualization" as a factor in maximizing network performance. I am wondering what the impact of this was in your sorting. Do you have data for runs where you didn't enable this?
It was part of the enhanced networking. Without enhanced networking, we were getting about 600MB/s, vs 1.1GB/s with.
Hi Reynold! Do you have numbers / intuition for how previous versions of spark would have run? I'm upgrading (soon) from spark 0.8 to spark 1.1 and am curious to see the performance gains (especially w.r.t. shuffles)
Hi Austin,

We haven't tested Spark 0.8 at this scale. In general Spark is advancing at a rapid rate that 1.1 is very very different from 0.8.

Curious if using the Sparrow scheduler would have been a net gain/loss to this type of work load?
It would help a little bit (maybe a few percent), but not much because the scheduling latency was relatively low for these tasks (the largest scheduling delay was ~10 secs, whereas each task takes minutes).