| HN Mirror

The datasheet isn't telling you the quantization (intentionally). Model weights at FP16 are roughly 2GB per billion params. A 200B model at FP16 would take 400GB just to load the weights; a single DGX Spark has 128GB. Even two networked together couldn't do it at FP16.

You can do it, if you quantize to FP4 — and Nvidia's special variant of FP4, NVFP4, isn't too bad (and it's optimized on Blackwell). Some models are even trained at FP4 these days, like the gpt-oss models. But gigabytes are gigabytes, and you can't squeeze 400GB of FP16 weights into only 128GB (or 256GB) of space.

The datasheet is telling you the truth: you can fit a 200B model. But it's not saying you can do that at FP16 — because you can't. You can only do it at FP4.