|
|
|
|
|
by epolanski
11 days ago
|
|
1. Deepseek V4 is still in preview (training is not finished) 2. Qwen is much more demanding and borderline unusable on consumer hardware because it's a dense model. The 27B parameters are active all time for each token. It's not a MoE architecture where a router activates only some of them. 3. Qwen doesn't like quantization at all. |
|
Settings: RTX 5090, 5-bit weights (Unsloth), FP8 KV cache.
Last time I tried running large MoEs on this PC, they had inferior quality at 2-3 bits compared to much smaller dense models at 5-6 bits, and were slower anyway.