| HN Mirror

I think this is the other point that's so often missed/ignored in the "big data" discussion: there's a middle ground between everything-fits-in-memory and must-be-distributed.

The Adam Drake article alludes to it only in the last sentence by mentioning traditional relational databases as an alternative.

For workloads that are relatively I/O-heavy and CPU-light, it's very hard to beat local SSDs (or even HDDs in enough quantity) attached to a single [1] node, if the competition is distributed storage attached by ethernet. It only takes a couple 600MB/s SSDs to saturate a 10GE. A server with 48 lanes of PCIe 3 slots could take the I/O of 78 of them.

100Gb/s networks are getting close. For upwards of $1k per server (NIC and switch port) one can bring that ratio closer to 4:1 from 39:1. I'd expect this is the attractive route for anyone with CPU needs that can't be met by a 4-socket server.

[1] Yes, there can be more than one node with copies for redundancy, as has been complained about elsewhere in the sub-thread, or even scalability