|
|
|
|
|
by alexeldeib
1020 days ago
|
|
That’s why I was curious about utilization since you mentioned low memory usage. I believe time slicing can work on those smaller cards these days. Did you explore any other optimizations like batching or concurrency for same model? Model heterogeneity seems like a real challenge there — you could optimize usage if you know all the sizes ahead of times and actually have gpu capacity to do efficient allocations, but it’s way harder than just doling 1 gpu per pod. e: also, latency because of reduced resources? Or what do you mean? |
|