Hacker News new | ask | show | jobs
by ijk 372 days ago
For LLM inference parallel GPUs is mostly fine (you take some performance hit but llama.cpp doesn't care what cards you use and other stuff handles 4 symmetric GPUs just fine). You get more problems when you're doing anything training related, though.