| > a single machine may be harder to schedule to be close to fully allocated That seems remarkably unlikely if there truly is only the one. Maybe I don't understand what you mean, though. How is that any different from having a single distributed cluster, instead? > if it goes down I hear (read) this quite a bit, especially recently, and I'm a bit mystified that it's even brought up in this day and age. Firstly, having worked with (what is now) commodity hardware for over a decade, I believe that people who haven't grossly over-estimate how often "it goes down", especially today. This overestimation is trotted out as a reason that operating ones own hardware is such a "nightmare" and therefore one must always use cloud or a VPS. With a "large" enough server, the risk goes up, of course. More DIMMs means more memory can fail, but we're still talking about low single digit percent for all errors (including correctable). IIRC, CPUs have even lower failure rates. Everything else tends to be redundant. Even then, your workflow isn't dead. It's just missing a DIMM or a CPU, possibly after a reboot (which won't be, if you configured it right). In many cases, downtime isn't actually caused by hardware but by software (or humans). That's not going to be unique to centralized processing versus distributed. Also, if the single machine is on a cloud provider, many hardware and utilization issues can be abstracted anyway, for a huge price premium. |
edit: if you're running on a cloud, you also may be able to autoscale to deal with spiky usage patterns.