Hacker News new | ask | show | jobs
by drv 5716 days ago
I haven't thought about it deeply, but my first guess would be that the seek time of hard disks is what makes serving 5000 requests per second difficult; consumer-grade hard disks can perform something in the range of a few hundred (randomly distributed) IOPS at best due to seek time. SSDs make seeks (almost) free, so even one decent SSD should be able to service 5000 requests per second, assuming you can get a big enough one.

If multiple storage devices (hard disks or otherwise) are allowed, RAID (or just splitting the data across multiple disks) would ameliorate the seek problem somewhat, since seeks can then happen in parallel. I wouldn't see this problem as requiring a distributed solution, unless the bottleneck is the bus or storage controller rather than the single-disk seek performance.

1 comments

Multiple disks, across different physical servers (instead of RAID) would be good too.

You could also try increasing the throughput by having an in memory cache. The index/hash table can also be in main memory. Depending on the keys, locality can play a crucial role in prefetching data.