Since each node was handling 500GB of data (roughly), I think the disk speed may have been a more critical factor since each node had 244GB of memory. Their nodes used SSDs; the older nodes used spinning rust. The seek times alone will be a killer.
If only that were true -- the shuffle is typically seek-bound when the intermediate data doesn't fit into cache (plenty of papers show this pretty conclusively).
Doesn't that speak to his point? Either smaller memory and more nodes, or more memory and less nodes. Why not do apples to apples? (This feels like the benchmarketing going on in browsers, which, at this point, is largely meaningless.)
Edit: on the other hand, this is an endorsement of the current wave of "per node performance stinks, let's avoid rewriting software for an extra year or two by throwing SSDs at it." Great for hardware vendors!
The network part is probably the most important one here, and both have comparable network.