| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by anonymousDan 4071 days ago
	I'm not sure I completely buy their claim that network/disk IO is no longer the bottleneck in many situations. Their recent NSDI paper (https://kayousterhout.github.io/trace-analysis/) is compute bound for a 20 node cluster, but they never evaluate it in a larger cluster with fixed data size. The availability of on-demand cloud resources would potentially make it easier to solve the bottleneck by just increasing the cluster size.

3 comments

jandrewrogers 4071 days ago

It really depends on what you are doing. These days, assuming the code is well designed and highly efficient (not that common), you typically bottleneck on effective memory bandwidth or network bandwidth. To the extent this is not the case, it frequently is due to relatively poor efficiency of implementation in other parts of the system (extremely common).

Given a set of cheap SSDs and a good I/O scheduler, very few workloads should bottleneck on disk because the disk has more available bandwidth than the network. If you are bottlenecking on disk, it usually means the I/O scheduling is poor. The operating system I/O scheduler is quite poor for many use cases, hence why I/O intensive apps write their own.

Network can be inexpensively and easily saturated at 10 GbE these days, even when doing something ugly like parsing JSON streams. Unless there is a bottleneck elsewhere in the system, like memory bandwidth or occasionally CPU, this is where I typically see bottlenecks in real systems. However, switch fabrics do not scale infinitely even if you know what you are doing, so there is that, and bandwidth does not always scale linearly (though things like graph analysis are much closer to effectively linear in sophisticated implementations than you see using naive algorithms).

Data motion is always the bottleneck. Right now, moving from RAM to CPU or from machine to machine is almost always the bottleneck if you are doing everything else right. Many popular software designs and implementations for "big data" have many other bottlenecks that are not strictly necessary, so that throws off expectations e.g. poor I/O scheduling saturating disk or poor JSON parsing burning up CPU or poor use of threading wasting a lot of CPU time.

link

kayoust 4071 days ago

For most workloads, increasing the number of machines does not increase the amount of data sent over the network, so the ratio of computation to network bandwidth stays the same. As a result, increasing the cluster size doesn't make workloads any more I/O bound.

At some point, a larger cluster will be more bandwidth-constrained because of oversubscription, but given the network utilizations we saw (<5% at the median in Figure 5), a cluster would have to have pretty high oversubscription for the network to become the bottleneck.

The one caveat here is, for example, matrix workloads, where the data sent over the network increases with the number of machines.

link

anonymousDan 4071 days ago

Interesting. What about e.g. map-reduce workloads with large fan-in for the reduce step?

link

rxin 4071 days ago

Author of the blog post here. Kay already pointed out that increasing cluster size usually reduces network utilization. In addition to that, there a few challenges with just getting bigger cluster size with "on-demand" resources:

1. Obviously, it costs more to get a bigger cluster.

2. You might not be able to get as big of a cluster as you wanted, since "cloud" is not an infinite resource.

3. Some workloads are not embarrassing parallel. For some (graph, matrix, join), more parallelism -> more communication.

link