Hacker News new | ask | show | jobs
by s17tnet 1384 days ago
Digging their repos is interesting. They also proposed a docker driver to lazily download image bits as the files in the overlay are accessed [0] from CernVM-FS; they claim significant drop in process start up time.

[0] https://indico.cern.ch/event/567550/contributions/2627182/at...

4 comments

Yeah I work on that.

The startup time is clearly faster as we don't download the image (especially if compared to download the layers and start the docker image).

The main "trick" is that docker images usually includes a lot of files that are not really accesses during standard operations hence pulling them is not needed most of the time.

Yep, I figured it out. I suppose your images are made of large dataset to crunch, for the most part and a smallish part with the R/python/whatever code do execute.
The data is not part of the images. It's only the software. In the vast majority of cases, any particular data processing job requires only a tiny fraction of the available software. For instance, a few hundred MB out of a few tens of GB for a typical LHC application software release.
In practice that sounds like an excellent optimization, but in theory it annoys me that we're doing that rather than figuring out how to build better binaries.
I work on a platform that handles fleets of edge devices running a linux-based OS, where applications are distributed as container images. Nvidia in particular are rather awful to support, as any users with their hardware inevitably build 10+ GB images, largely composed of libraries and samples they'll never use. Plenty of other users are unaware that they can improve the speed and reliability of their deployments by trimming the fat from their images.

A lot of work goes into properly handling and optimizing the download and distribution of excessively large application images, often on slow and unreliable networks, when smaller is always faster and more reliable.

I'd love that for rescue media, just load what you need and mirror rest of the image to RAM in background
AppFS is similar and I already have a Docker container called "rkeene/appfs" on DockerHub.
We developed something similar in-house. For most images it's a notable startup speedup.
Mind sharing what your in-house solution is? I have been working on something similar with extracted layers on AFS and using Podman’s additional layer store.