|
|
|
|
|
by discardorama
4276 days ago
|
|
It's interesting, but not earth-shattering. The "10x fewer nodes" means nothing; how powerful are the new nodes? What's the network? Do you use SSDs? etc. etc. They also tuned their code to this specific problem: "Exploiting Cache Locality: In the sort benchmark, each record is 100 bytes, where the sort key is the first 10 bytes. As we were profiling our sort program, we noticed the cache miss rate was high, because each comparison required an object pointer lookup that was random..... Combining TimSort with our new layout to exploit cache locality, the CPU time for sorting was reduced by a factor of 5." I would love to see MR and Spark compete on the exact same hardware configuration. |
|