|
|
|
|
|
by mwcampbell
6 days ago
|
|
I invested about $4,000 in an NVIDIA DGX Spark several months ago. 128 GB of unified RAM, and the NVIDIA GB10 chip. With the RAM, the several CPU cores, and the 4 TB NVMe SSD, it's a very capable ARM64 Linux computer even without the GPU, and so far I've mostly been using it as such. But I wonder, what's the most capable model, specifically for coding, that can run well on that hardware? |
|
The Qwen3.6-35B-A3B planner hums along at 50-55 tokens/s, and the Qwen3-Coder-30B-A3B-Instruct coder does 30-35. With both agents up and ready to work, RAM consumption sits at about 112 of 128GB.
It's pretty okay. I'm faffing around with having it disassemble old MS-DOS games from the 1980s, which is a task that lends itself well to the setup. It's not the fastest thing in the world, but with the planner's context window at 256k tokens and the coding agent at 128k, they chew through pretty long task lists handing things back and forth without complaint. The only real issue is that even with really tightly scoped prompts, the coding agent tends to hallucinate like it's on LSD. But the planning agent appears to be quite good at spotting the hallucinations and re-parceling work back to the coder.
It's neat. I'm going to be sad when I have to return the review unit in a couple of months.
edit - I also have been fiddling with Deepseek v4 Flash via Antirez's setup (https://github.com/antirez/ds4), and it's pretty fantastic (and fantastically easy to get running). It's pretty pokey on the Spark, though, at 14-ish tokens/sec. And unless you have a second Spark, it's going to be the only model you run at one time, as it eats alllll the rams.