|
|
|
|
|
by pavelstoev
1128 days ago
|
|
Often you are better off running certain workloads on lesser GPUs. But this requires certain tricky compiler-level optimizations. For example, can run certain LLM inference with comparable latency on cheaper A40s vs running on A100s. Could also run on 3090s (sometimes even faster). This helps with operating costs but may also resolve availability constraints. |
|