|
|
|
|
|
by throw_me_uwu
233 days ago
|
|
Because ML infra is bloatware beyond belief. If it was engineered right, it would take: - transfer model weights from NVMe drive/RAM to GPU via PCIe - upload tiny precompiled code to GPU - run it with tiny CPU host code But what you get instead is gigabytes of PyTorch + Nvidia docker container bloatware (hi Nvidia NeMo) that takes forever to start. |
|