|
|
|
|
|
by itronitron
2903 days ago
|
|
A lot of those > 1TB data sources are very standardized, they can be mapped to a schema, in which case indexing the data supports interactive queries and analytics. Hadoop seems well suited for data that needs to be processed in very different and changing ways. |
|
The Adam Drake article alludes to it only in the last sentence by mentioning traditional relational databases as an alternative.
For workloads that are relatively I/O-heavy and CPU-light, it's very hard to beat local SSDs (or even HDDs in enough quantity) attached to a single [1] node, if the competition is distributed storage attached by ethernet. It only takes a couple 600MB/s SSDs to saturate a 10GE. A server with 48 lanes of PCIe 3 slots could take the I/O of 78 of them.
100Gb/s networks are getting close. For upwards of $1k per server (NIC and switch port) one can bring that ratio closer to 4:1 from 39:1. I'd expect this is the attractive route for anyone with CPU needs that can't be met by a 4-socket server.
[1] Yes, there can be more than one node with copies for redundancy, as has been complained about elsewhere in the sub-thread, or even scalability