|
|
|
|
|
by behohippy
310 days ago
|
|
Used 3090s have been getting expensive in some markets. Another option is dual 5060ti 16 gig. Mine are lower powered, single 8 pin power, so they max out around 180W. With that I'm getting 80t/s on the new qwen 3 30b a3b models, and around 21t/s on Gemma 27b with vision. Cheap and cheerful setup if you can find the cards at MSRP. |
|
That gives us a total TDP of around 150W, 48 GB of VRAM and we can run Qwen 3 Coder 30B A3B at 4bit quantization with up to 32k context at around 60-70 t/s with Ollama. I also tried out vLLM, but the performance surprisingly wasn't much better (maybe under bigger concurrent load). Felt like sharing the data point, because of similarity.
Honestly it's a really good model, even good enough for some basic agentic use (e.g. with Aider, RooCode and so on), MoE seems the way to go for somewhat limited hardware setups.
Ofc obviously not recommending L4 cards cause they have a pretty steep price tag. Most consumer cards feel a bit power hungry and you'll probably need more than one to fit decent models in there, though also being able to game with the same hardware sounds pretty nice. But speaking of getting more VRAM, the Intel Arc Pro B60 can't come soon enough (if they don't insanely overprice it), especially the 48 GB variety: https://www.maxsun.com/products/intel-arc-pro-b60-dual-48g-t...