Hacker News new | ask | show | jobs
by wora 2718 days ago
A large data center has 20-30k machines. Amazon disclosed that back in 2014. Most likely you want to partition the machine into some chunks, so each application or each database only use some of them. So you don't need to talk to many machines at once.

In general, a distributed database can be really fast. DynamoDB claim to be 3ms (AWS reinvent 2014). The current technology can probably achieve close to 1ms, which is equivalent to memcache. With such performance, you don't really need local storage.

You can get much faster IO if you use many local SSDs. The downside is utilization. It is very rare a single machine has a workload that fully utilize local disk. You end up over provision greatly fleet-wise to get high performance. A managed database over a network is more likely to utilize disk/SSD throughput.