Hacker News new | ask | show | jobs
by freeqaz 652 days ago
How much do the NVLinks help in this case?

Do you have a rough estimate of how much this cost? I'm curious since I just built my own 2x 3090 rig and I wondered about going EPYC for the potential to have more cards (stuck with AM5 for cheapness though).

All in all I spent about $3500 for everything. I'm guessing this is closer to $12-15k? CPU is around $800 on eBay.

3 comments

My reason for going Epyc was for Pcie lanes and cheaper enterprise SSDs via U.3/2. With AM5, you tap out the lanes with dual GPUs. Threadripper is preferable but Epyc is about 1/2 of the price or even better if you go last gen.
Why do you need such high cross card bandwidth for inference? Are you hosting for a lot of users at once?
The Epyc boards make things way easier (I have 4 epyc boards of various generations) because they have loads of x16 slots and you’re not screwing around with bifurcation and sketchy PCI splitters. Another oft-forgotten item that consumes lanes is 25 or 40Gb NICs which you might fine you want if you’re pushing big model files around to other machines or storage.
I tried this w/ AM5, but realized that despite there theoretically being enough lanes for dual x16 PCI-e 4.0 GPUs, I couldn't find any motherboards that are actually configured this way, since dual-GPU is dead in consumer for gaming.
I built this in early 2023 out of used parts and ended up with a cost of 2300€ for AM4/128GB/2x3090 @ PCIe 4.0x8 +nvLink
I haven't been able to find a good answer on what difference NVLink makes or which applications support it.
NVLink is what makes multiGPU work. It lets the GPUs talk to each other across a high bandwidth (600 Gbps), low latency link. Tensorflow and PyTorch both support it, among other things. It's not this weird thing that's a side note, the interconnect between nodes is what makes a supercomputer super. You don't hear about it much because you don't hear about a lot of details of supercomputer stuff in mainstream media.
Thank you, but this doesn't really answer OPs or my question. Is NVLink required if you want to run an LLM model which exceeds the memory of a single GPU? What are the benchmark comparisons with and without it?

I've heard that NVLink helps with training, but not so much with inferencing.