| This is basically exactly what we do we have created a cloud optimised tar (cotar)[1] by creating a hash index of the files inside the tar. I work with serving tiled geospatial data [2] (Mapbox vector tiles) to our users as slippy maps where we serve millions of small (mostly <100KB) files to our users, our data only changes weekly so we precompute all the tiles and store them in a tar file in s3. We compute a index for the tar file then use s3 range requests to serve the tiles to our users, this means we can generally fetch a tile from s3 with 2 (or 1 if the index is cached) requests to s3 (generally ~20-50ms). To get full coverage of the world with map box vector tiles it is around 270M tiles and a ~90GB tar file which can be computed from open street map data [3] > Though even that would only work with a subset of compression methods or no compression. We compress the individual files as a work around, there are options for indexing a compressed (gzip) tar file but the benefits of a compressed tar vs compressed files are small for our use case [1] https://github.com/linz/cotar (or wip rust version https://github.com/blacha/cotar-rs)
[2] https://github.com/linz/basemaps or https://basemaps.linz.govt.nz
[3] https://github.com/onthegomap/planetiler |