|
The interesting aspect of scaling up is that it doesn't matter how fast you are at individual single-core computation. Fast single core computation, or even SIMD GPU processing, is largely an "easy" problem: get a stream of data going, or get a chunk of data into the system, and work away on it. What makes scaling up hard is moving data around. Once you have more than a single computer, there is no way you can easily share memory between them, so you have to impose some kind of copying for the system to work. If you want to demux a stream for multiple workers, you have to distribute work to the workers. If you have massive amounts of data in a cluster, you have to move the computation to the nodes in the cluster on which the data resides. Moving data around requires you to have good orchestration of "mostly stateless" computations, with a couple of pinches of persistence strewn in as well. You can do this well in any language, but what makes Erlang well suited for it is that it provides some decent primitives for you with a lot of time sunk into the architecture. Beating this architecture in any other system requires you to spend some time doing that. And chances are it isn't as general, so when the world around you change, the framework you used is left behind. Before Erlang, Tandem systems built hardware/software with many of the same ideas in them. They built these systems primarily for fault tolerance and robustness, but they found, somewhat to their surprise, the same architecture is good at scaling. The reason I believe, after 10 years of Erlang programming, is mostly that the computation model of isolated services forces you to think distribution into the system from day one. Your solution naturally gravitates toward the distributed model, and this in turn means it is easier to scale out later. The model also makes it hard to accidentally build a part of the system which can slow down everything. I think this should be given more credit than it is normally given. And once you have your problem distributed, you call into that CUDA GPU code on the node to obtain the high computation speed. Or you call into your FPGA or DSP ASIC. Any problem on the CPU is slow because of its general purpose behavior (the exception: You are Fabrice Bellard) |
Indeed, Jim-gray's (from tandem) paper 'why computers stop and what we can do about it' is an quite good. It contains a detailed report of machine failure including s/w and h/w and details techniques for reducing the mtbf by these.
Erlang's language and runtime seems to have picked seminal ideas from here...