After seeing the Docker image for vllm jump +5Gb (to 10Gb!) over the past five months, I grew suspicious of vllm's development practices [1]. It's not easy, for sure, to deal with all those flaky python modules [2].
But having the CUDA packages four times in different layers is questionable! [3]
Yet again, as a college mate of mine used to say, "Don't change it. It works."