|
|
|
|
|
by Mernit
1015 days ago
|
|
Sharing GPUs only really makes sense for GPUs that are large enough to share. MIGs can work for 80Gi A100s but won't work with smaller cards like T4s. It also adds latency to the GPU operations. Unfortunately there's not yet a silver bullet for this stuff. |
|
Model heterogeneity seems like a real challenge there — you could optimize usage if you know all the sizes ahead of times and actually have gpu capacity to do efficient allocations, but it’s way harder than just doling 1 gpu per pod.
e: also, latency because of reduced resources? Or what do you mean?