Hacker News new | ask | show | jobs
by codeslinger 5708 days ago
Why have SSDs killed this question? Seems like you could tweak the amount of data on disk and/or the size of the values and/or the requests/second requirement and keep using it. Also, there are no 1TB SSDs available at this point, so you'd still have to assume spinning platters for this, no?
3 comments

I haven't thought about it deeply, but my first guess would be that the seek time of hard disks is what makes serving 5000 requests per second difficult; consumer-grade hard disks can perform something in the range of a few hundred (randomly distributed) IOPS at best due to seek time. SSDs make seeks (almost) free, so even one decent SSD should be able to service 5000 requests per second, assuming you can get a big enough one.

If multiple storage devices (hard disks or otherwise) are allowed, RAID (or just splitting the data across multiple disks) would ameliorate the seek problem somewhat, since seeks can then happen in parallel. I wouldn't see this problem as requiring a distributed solution, unless the bottleneck is the bus or storage controller rather than the single-disk seek performance.

Multiple disks, across different physical servers (instead of RAID) would be good too.

You could also try increasing the throughput by having an in memory cache. The index/hash table can also be in main memory. Depending on the keys, locality can play a crucial role in prefetching data.

This may come as a shock, but you can put more than one SSD in a server :-P
I think that the parent thought that the system had to use a single SSD. The first time I read the question I thought that I had to use the hard drive that the data came on, and that it was all I was allowed to use. I didn't realize that the point was to design a system to simply serve the data that was given to us. After reading it, it becomes pretty clear that a simple raid5 with a few SSD's makes this problem moot.