| HN Mirror

Fair enough, for in-memory only, if 1TB is your raw data, by the time it's indexed, it's going to be bigger.

Surely, though, workloads that require in-memory performance are fairly niche, and jumping from there to distributed (even in-memory) seems non-obvious, at best. Why aren't large arrays of fast SSDs a better alternative? The bandwidth is comparable, but the latency is terrible (still comparable to ethernet to a remote node, though?)

What about workloads that don't require fully-in-memory in the first place? If the cutoff is, then hundreds of TB, wouldn't that cover the vast majority of common use cases?

> I can't really comment on rates of machine failures, but I have seen it happen before, even just for stupid reasons like someone in a data center unplugging a machine.

That sort of anecdata isn't very useful, because a human can cause any failure at any layer, including someone stop a whole cluster, which I've seen happen before.

My point about it not being a legitimate concern is that what is now common practice with what is now common equipment means it's uncommon. These practices and equipment had to evolve, but that evolution happened on the order of over a decade ago.

Also, be wary of selection bias. It's very easy to remember the "fire drill" because of the one machine failure, and it makes a much more interesting story to tell that gets passed around and modified enough, eventually sounding like multiple stories and therefore multiple machines. The hundreds of servers that operated unheard and unseen for years, sometimes beyond their specs (e.g. with only only blower out of four still turning and only half-speed at that), get nary a thought, let alone mention.