| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rxin 4276 days ago
	The old entry had 10Gb/s <full-duplex> (40 nodes/rack 160Gbps rack to spine. 2.5:1 subscription), 64GB of RAM, and 12 x 3TB SATA. The network part is probably the most important one here, and both have comparable network.

2 comments

discardorama 4276 days ago

Since each node was handling 500GB of data (roughly), I think the disk speed may have been a more critical factor since each node had 244GB of memory. Their nodes used SSDs; the older nodes used spinning rust. The seek times alone will be a killer.

link

rxin 4276 days ago

Not sure why you mentioned seek time. In large scale, distributed sorting, I/O is mostly sequential.

link

tlipcon 4276 days ago

If only that were true -- the shuffle is typically seek-bound when the intermediate data doesn't fit into cache (plenty of papers show this pretty conclusively).

link

rxin 4276 days ago

Hi Todd,

Except in the case of MR 2100 nodes the entire dataset fit in memory :)

link

lmeyerov 4276 days ago

Doesn't that speak to his point? Either smaller memory and more nodes, or more memory and less nodes. Why not do apples to apples? (This feels like the benchmarketing going on in browsers, which, at this point, is largely meaningless.)

Edit: on the other hand, this is an endorsement of the current wave of "per node performance stinks, let's avoid rewriting software for an extra year or two by throwing SSDs at it." Great for hardware vendors!

link

rxin 4276 days ago

No it doesn't. The old record used 2100 nodes so the entire data actually fit in memory. There shouldn't be much seek happening even in the MR 2100 case. In Spark's case, the data actually doesn't fit in memory.

Also this was primary network bound. The old record had 2100 nodes with 10Gbps network.

link

Twirrim 4276 days ago

3TB SATA would indicate spinning rust too, so slower storage. It's far from an apples to apples comparison.

link