| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by semi-extrinsic 377 days ago

> save on the heartbreak of buying an RTX 5090 only to find that even that doesn’t help much with LLM inference and we’re all gonna need the cheaper-but-more-VRAM Intel Arc B60s

When going for more VRAM, with an RTX 5090 currently sitting at $3000 for 32GB, I'm curious why people aren't trying to get the Dell C4140s. Those seem to go for $3000-$4000 for the whole server with 4x V100 16GB, so 64GB total VRAM.

Maybe it's just because they produce heat and noise like a small turbojet.

1 comments

nickpsecurity 377 days ago

Don't the parallelizing techniques of a 4x build make using them more difficult than a 1x build with no extra parallelism? Couldn't the 32GB 4090 handle more models in their original configurations?

link

ijk 376 days ago

For LLM inference parallel GPUs is mostly fine (you take some performance hit but llama.cpp doesn't care what cards you use and other stuff handles 4 symmetric GPUs just fine). You get more problems when you're doing anything training related, though.

link

zargon 377 days ago

> Don't the parallelizing techniques of a 4x build make using them more difficult than a 1x build with no extra parallelism?

For inference, no. For training, only slightly.

link