Hacker News new | ask | show | jobs
by pavelstoev 1128 days ago
Often you are better off running certain workloads on lesser GPUs. But this requires certain tricky compiler-level optimizations. For example, can run certain LLM inference with comparable latency on cheaper A40s vs running on A100s. Could also run on 3090s (sometimes even faster). This helps with operating costs but may also resolve availability constraints.
1 comments

A40 / A6000 and A5000 is a great GPU for single GPU inference and training, provides better price/performance than A100 for models that fit.