> Years ago David Cheriton at Stanford taught me something that seemed very obvious at the time -- that if you have a network link with low bandwidth then it's an easy matter of putting several in parallel to make a combined link with higher bandwidth, but if you have a network link with bad latency then no amount of money can turn any number of them into a link with good latency.
You can tell it's dated from this little tidbit:
> The Cable TV industry is hyping "cable modems" right now
Theoretically, if your bandwidth is high enough, you can transfer the entire computational state of the distant resource to a local substrate, and then run the computation locally for a low-latency conversation.
So, if you are annoyed by the slow comms of our alpha centauri - earth channel, just transfer _your entire brain_ to a local avatar and I'll converse with that. Then run "git merge" to bring the remote history back to the master repo.
@cgaebel The code for your blog seems to have a race condition in it where the content is all there on the page, its just invisible, and only appears properly about 1 in 10 times. Either that or maybe your code is just hanging because I'm not allowing it to load Google Analytics like it wants.
This only works for "embarrassingly parallel" tasks, where you have lots of completely independent things to do. Please remember that the whole world is not a web server.
This is not trading latency for thruput. This is spending hardware for thruput, with some overhead in latency.
The relevant generic performance number is CPU-seconds (for CPU-bound work), or I/O consumption (for disk-bound work), or in general how much of your bottleneck resource is consumed. Once you know your bottleneck, you can either improve your code to use less of that resource, or buy more of that resource.
As tasks become less embarrassingly parallel, throwing (non-serial) hardware at a problem increases communication overhead, and gives lower speedups.
Another factor is that technology improvements tend to favor increases in bandwidth over increases in latency (and there are hard limitations on latency, like the speed of light in distributed systems). This short paper by David Patterson is a great read on the subject:
Well, in practice the latency that normaly matters is between "start doing activity" and "finish doing activity", and the time actualy working (that depends on bandwidth) is normaly orders of magnitude bigger than the time waiting for data (that depends on latency). There are exceptions, but not many.
And now that I really thought about it, looks like the article's law isn't that relevant in practice. Yes, you can always trade latency for bandwidth if you throw some money at it. But money is finite.
Well, in practice the latency that normaly matters is between "start doing activity" and "finish doing activity", and the time actualy working (that depends on bandwidth) is normaly orders of magnitude bigger than the time waiting for data (that depends on latency).
I don't think that's true; moreover, it is likely to become even less true in the future. Once the data needed for a computation has arrived at the CPU, for most applications the required computation is pretty cheap -- the time spent waiting for the data often dominates the time spent computing with the data. Much of modern CPU design has been devoted to trying to hide the high latency of memory accesses.
Little's law certainly applies here. The article's author does have a good point though: if there is no queue and requests are processed one by one, your only option is good old serial program optimization.
Still, I think most large systems where latency is a major concern have some level of concurrency and/or queueing that can be manipulated to reduce latency.
Performance is hard, but we should all be working together towards the things that matter the most. Latency is more important to optimize than throughput. Let's just focus on that.
I believe this is naive. First, it has no context. You optimize for what you need; there is no absolute best thing. If you have fantastic latency, but your throughput is not good enough to meet your needs, then, no, latency is not more important.
Second, while we tend to trade latency for throughput, it's not an even trade. That is, we tend to trade a small increase in latency for a large increase in throughput.
The insight the author had is a good one (although as others have pointed out, we've realized it before), but I think he oversold the conclusions.
> You optimize for what you need; there is no absolute best thing. If you have fantastic latency, but your throughput is not good enough to meet your needs, then, no, latency is not more important.
The whole point of this post is that if you have low latency, there are easy ways to trade it away for better throughput. Whereas there are not easy ways to trade high throughput for better latency. And, therefore, latency is fundamentally more important.
And my point is that absent knowing what your needs are, it's silly to talk about what is "more important". Because of the lopsided nature of the tradeoff (small latency harm for big throughput gain), it's dangerous to keep around "latency is more important" as a mantra.
"But it's always either reversing one of the tricks above, or exploiting some domain-specific parallelism."
That's kind of the point of a genuine tradeoff: you reverse what you did to improve one to improve the other. But when you say, "we tend to trade a small increase in latency for a large increase in throughput", you have the heart of what the author is complaining about. If you reverse a small increase of latency/large increase in throughput, you get a large decrease in throughput for only a small decrease in latency. To decrease latency, you have to do something else, making it the factor to watch.
It's hard to write latency benchmarks for desktop apps. I remember this discussion happening when the "completely fair" scheduler was introduced to the Linux kernel a few years ago.
Amdahl's law is only related in the sense that they are both
dealing with concurrency. It deals with diminishing returns due to parts of programs that can't take advantage of parallel processing.
The OP is saying that there are techniques that hurt latency and increase throughput, but not necessarily vice versa (except unwinding techniques going the other way), so it makes sense to focus on latency first.
It's an interesting theory, and I've filed it away as something to ponder, but I'm not sure in what use cases it's valid for.
If you've got a stream of independent non trivial requests, then the best throughput strategy is to process each request on one thread; almost all computing resources are focused on the problem at hand.
The best latency strategy is to devote all threads to each request in turn (assuming this gives a speedup). The communication overhead will increase the CPU cost of dealing with each request, hurting overall throughput (if you process two requests simultaneously, your latency will explode).
In this scenario, it's a simple trade off and thus doesn't seem to fit. Of course, nothing real world is this idealized, and the devil in the details is probably where the OP's law fits in.
> Years ago David Cheriton at Stanford taught me something that seemed very obvious at the time -- that if you have a network link with low bandwidth then it's an easy matter of putting several in parallel to make a combined link with higher bandwidth, but if you have a network link with bad latency then no amount of money can turn any number of them into a link with good latency.
You can tell it's dated from this little tidbit:
> The Cable TV industry is hyping "cable modems" right now