| > I think that past a terabyte, there begin to be strong arguments to a distributed approach even if you can theoretically handle it on one machine I've elaborated on these in another comment, as well. Today, drawing the line at a single terabyte is way too early, even for all-in-memory workloads, if only because there exists an almost 4TB AWS instance now. Any smaller than 3.5TB (or whatever RAM is available to applications) is, at best, living in the past. > scalability if requirements change This reads as premature optimization, which turns the strong argument into either a weak argument or even an argument against. Now, if you know or have reasonable certainty that your requirements will change (and will do so faster than, say Moore's Law) and change soon, then that's different. I suspect there are people who think this, but that it's little more than wishful thinking or a delusion as to how large their slice of "web scale" actually is. > resilience to machine failures Machine failures just aren't a legitimate consideration for modern, high-end (but still commodity) hardware. You wouldn't bet your whole business on it, of course, but a 1% chance every year of losing an hour or two of batch processing? Sure. Sadly, the flip side of this is that I see Hadoop clusters being built with such reliable servers, including redudant PSUs and fans, instead of taking full advantage of the resilience at the software level in order to save as much as possible at the hardware level. The original company behind map-reduce is certainly not splurging on hardware. |
I can't really comment on rates of machine failures, but I have seen it happen before, even just for stupid reasons like someone in a data center unplugging a machine.