Hacker News new | ask | show | jobs
by Uehreka 376 days ago
Love the attention to detail, I can tell this was a lot of work to put together and I hope it helps people new to PC building.

I will note though, 12GB of VRAM and 32GB of system RAM is a ceiling you’re going to hit pretty quickly if you’re into messing with LLMs. There’s basically no way to do a better job at the budget you’re working with though.

One thing I hear about a lot is people using things like RunPod to briefly get access to powerful GPUs/servers when they need one. If you spend $2/hr you can get access to an H100. If you have a budget of $1300 that could get you about 600 hours of compute time, which (unless you’re doing training runs) should last you several months.

In several months time the specs required to run good models will be different again in ways that are hard to predict, so this approach can help save on the heartbreak of buying an RTX 5090 only to find that even that doesn’t help much with LLM inference and we’re all gonna need the cheaper-but-more-VRAM Intel Arc B60s.

2 comments

I don't understand why some people build a "rig", put a lot of thoughts into ever so slightly differently binned CPUs, and then don't max out RAM(put aside DDR5 quirk considerations). It's like buying a sports car only to cheap out on tires. It makes no sense.
I built my current computer last fall. The Ryzen 7950X was on an awesome sale for black Friday and after looking at the math buying a 9950X just didn’t make sense. So I got the 7950X and 96GB of DDR5 RAM (2 sticks, so I can double later if I need to). Loving it, it was the perfect choice.

All this to say some people do in fact do this ;)

> save on the heartbreak of buying an RTX 5090 only to find that even that doesn’t help much with LLM inference and we’re all gonna need the cheaper-but-more-VRAM Intel Arc B60s

When going for more VRAM, with an RTX 5090 currently sitting at $3000 for 32GB, I'm curious why people aren't trying to get the Dell C4140s. Those seem to go for $3000-$4000 for the whole server with 4x V100 16GB, so 64GB total VRAM.

Maybe it's just because they produce heat and noise like a small turbojet.

Don't the parallelizing techniques of a 4x build make using them more difficult than a 1x build with no extra parallelism? Couldn't the 32GB 4090 handle more models in their original configurations?
For LLM inference parallel GPUs is mostly fine (you take some performance hit but llama.cpp doesn't care what cards you use and other stuff handles 4 symmetric GPUs just fine). You get more problems when you're doing anything training related, though.
> Don't the parallelizing techniques of a 4x build make using them more difficult than a 1x build with no extra parallelism?

For inference, no. For training, only slightly.