| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ijk 372 days ago
	For LLM inference parallel GPUs is mostly fine (you take some performance hit but llama.cpp doesn't care what cards you use and other stuff handles 4 symmetric GPUs just fine). You get more problems when you're doing anything training related, though.