Hacker News new | ask | show | jobs
by hcrisp 2308 days ago
It's been a while since I worked on it, but I did get pyfive to work reading from S3 objects using either IOBytes around the entire bytearray read into memory or against a custom class that implemented peek, seek, etc. against an S3 object (the first method was better if you need to read a majority of a large file, the second was better for a small subset of it). Note that it supports read-only not write. Later I heard that I wouldn't have to use pyfive since h5py now supports file-like objects. So your comments about no cloud bucket support are not exactly true.
1 comments

To be clear, our experience using gcsfuse and friends to do basically the same things was extremely painful and a performance nightmare. The HDF format was designed for a world where seeks are free which makes cloud access very high latency and very low throughput.