|
|
|
|
|
by epistasis
1059 days ago
|
|
Working in genomics, I've dealt with lots of petabyte data stores over the past decade. Having used AWS S3, GCP GCS, and a raft of storage systems for collocated hardware (Ceph, Gluster, and an HP system whose name I have blocked from my memory), I have no small amount of appreciation for the effort that goes into operating these sorts of systems. And the benefits of sharing disk IOPs with untold numbers of other customers is hard to understate. I hadn't heard the term "heat" as it's used in the article but it's incredibly hard to mitigate on single system. For our co-located hardware clusters, we would have to customize the batch systems to treat IO as an allocatable resource the same as RAM or CPU in order to manage it correctly across large jobs. S3 and GCP are super expensive, but the performance can be worth it. This sort of article is some of the best of HN, IMHO. |
|