|
|
|
|
|
by vardump
261 days ago
|
|
So 235B parameter Qwen3-VL is FP16, so practically it requires at least 512 GB RAM to run? Possibly even more for a reasonable context window? Assuming I don’t want to run it on a CPU, what are my options to run it at home under $10k? Or if my only option is to run the model with CPU (vs GPU or other specialized HW), what would be the best way to use that 10k? vLLM + Multiple networked (10/25/100Gbit) systems? |
|
You probably don't need fp16. Most models can be quantized down to q8 with minimal loss of quality. Models can usually be quantized to q4 or even below and run reasonably well, depending on what you expect out of them.
Even at q8, you'll need around 235GB of memory. An Nvidia RTX 5090 has 32GB of VRAM and has an official price of about $2000, but usually retails for more. If you can find them at that price, you'd need eight of them to run a 235GB model entirely in VRAM, and that doesn't include a motherboard and CPU that can handle eight GPUs. You could look for old mining rigs built from RTX 3090s or P40s. Otherwise, I don't see much prospect for fitting this much data into VRAM on consumer GPUs for under $10k.
Without NVLink, you're going to take a massive performance hit running a model distributed over several computers. It can be done, and there's research into optimizing distributed models, but the throughput is a significant bottleneck. For now, you really want to run on a single machine.
You can get pretty good performance out of a CPU. The key is memory bandwidth. Look at server or workstation class CPUs with a lot of DDR5 memory channels that support a high MT/s rate. For example, an AMD Ryzen Threadripper 7965WX has eight DDR5 memory channels at up to 5200 MT/s and retails for about $2500. Depending on your needs, this might give you acceptable performance.
Lastly, I'd question whether you really need to run this at home. Obviously, this depends on your situation and what you need it for. Any investment you put into hardware is going to depreciate significantly in just a few years. $10k of credits in the cloud will take you a long way.