|
|
|
|
|
by kayoust
4071 days ago
|
|
For most workloads, increasing the number of machines does not increase the amount of data sent over the network, so the ratio of computation to network bandwidth stays the same. As a result, increasing the cluster size doesn't make workloads any more I/O bound. At some point, a larger cluster will be more bandwidth-constrained because of oversubscription, but given the network utilizations we saw (<5% at the median in Figure 5), a cluster would have to have pretty high oversubscription for the network to become the bottleneck. The one caveat here is, for example, matrix workloads, where the data sent over the network increases with the number of machines. |
|