Hacker News new | ask | show | jobs
by nickpsecurity 373 days ago
Don't the parallelizing techniques of a 4x build make using them more difficult than a 1x build with no extra parallelism? Couldn't the 32GB 4090 handle more models in their original configurations?
2 comments

For LLM inference parallel GPUs is mostly fine (you take some performance hit but llama.cpp doesn't care what cards you use and other stuff handles 4 symmetric GPUs just fine). You get more problems when you're doing anything training related, though.
> Don't the parallelizing techniques of a 4x build make using them more difficult than a 1x build with no extra parallelism?

For inference, no. For training, only slightly.