|
|
|
|
|
by coder543
79 days ago
|
|
For the many DGX Spark and Strix Halo users with 128GB of memory, I believe the ideal model size would probably be a MoE with close to 200B total parameters and a low active count of 3B to 10B. I would personally love to see a super sparse 200B A3B model, just to see what is possible. These machines don't have a lot of bandwidth, so a low active count is essential to getting good speed, and a high total parameter count gives the model greater capability and knowledge. It would also be essential to have the Q4 QAT, of course. Then the 200B model weights would take up ~100GB of memory, not including the context. The common 120B size these days leaves a lot of unused memory on the table on these machines. I would also like the larger models to support audio input, not just the E2B/E4B models. And audio output would be great too! |
|