Hacker News new | ask | show | jobs
by cosmotic 105 days ago
Why does the model data need to be stored in the image? Download the model data on container startup using whatever method works best.
2 comments

You are correct! From our tests, storing model weights in the image actually isn't a preferred approach for model weights larger than ~1GB. We run a distributed, multi-layer cache system to combat this and we can load roughly 6-7GB of files in p99 of <2.5s
hey cosmotic, we're not really advocating for storing model weights in the container image.

even the smaller nvidia images (like nvidia/cuda:13.1.1-cudnn-runtime-ubuntu24.04) are about 2Gb before adding any python deps and that is a problem.

if you split the image into chunks and pull on-demand, your container will start much faster.

Just pre-install the NVIDIA layer on the filesystem instead of docker-pulling it for every single machine.