Hacker News new | ask | show | jobs
by formerly_proven 105 days ago
The gzip compression of layers is actually optional in OCI images, but iirc not in legacy docker images. The two formats are not the same. On SSDs, the overhead for building an index for a tar is not that high, if we're primarily talking about large files (so the data/weights/cuda layers instead of system layers). The approach from the article is of course still faster, especially for running many minor variations of containers, though I am wondering how common it is for only some parts of weights changing? I would've assumed that most things you'll do with weights would change about 100% of them when viewed through 1M chunks. The lazy pulling probably has some rather dubious/interesting service latency implications.

The main annoyance imho with gzip here is that it was already slow when the format was new (unless you have Intel QAT and bothered to patch and recompile that into all the go binaries which handle these, which you do not).

1 comments

Yeah that’s fair. For weights specifically there often isn’t a huge dedupe win across versions since retraining tends to change most of them. That said, we generally don’t advocate including model weights in container images anyway. The main benefit for us is avoiding the need to pull the full image up front and only fetching the data actually touched during startup. On the latency side, reads happen over a local network with caching and prefetching, so the impact on request latency is typically minimal.