Hacker News new | ask | show | jobs
by CobrastanJorji 1052 days ago
It also explains some of the cost model for cloud storage. The best possible customer, from a cloud storage perspective, stores a whole lot of data but reads almost none of it. That's kind of like renting hard drives, except if you only fill some of each hard drive with the "cold" data, you can still use the hard drive's full I/O capacity to handle the hot work. So, if you very carefully balance what sort of data is on which drive, you can keep all of the drives in use despite most of your data not being used. That's part of why storage is comparatively cheap but reads are comparatively expensive.
1 comments

You get similar properties/challenges in lots of multi consumer storage scenarios. I learned lots of similar lessons working on CDNs when it comes to object distribution and access rates.

If youre interested go search for some of the published work from "Coho Data", they had some great usenix presentations IIRC. This was the previous company Andy Warfield was at and they had an emphasis on effective tracking & prediction of IO workloads across very large datasets.