|
|
|
|
|
by discardorama
4275 days ago
|
|
As per the specification of this test, the data has to be committed on disk before it is considered sorted. So even if it all fits in memory, it has to be on disk before the end. So you have 100TB of disk read, followed by 100TB of disk write, all on HDDs. That's about 100GB/node; and since Hadoop nodes are typically in RAID-6, each write has an associated read and write too. This does not even include the intermediate files, which (depending on how the kernel parameters have been set), could have been written on disk. Typical dirty_background_ratio is 10; so after 6GB of dirty pages, pdflush will kick in and start writing to the spinning disk. |
|
Maybe you can email me offline. I can tell you more about the setup and how Spark / MapReduce works w.r.t. to it.