Hacker News new | ask | show | jobs
by wirybeige 22 days ago
DS4 Pro/Flash were post trained with QAT, so they are already quantized to FP4 for the most part. That's why when downloading the weights, they are much smaller than what their weights at fp8 or fp16 would be. For example, Flash is a 284B model, but its GB size is only ~160GB. OFC maybe DeeppInfra went even further, but there is no proof of that.
1 comments

Interesting then that OpenRouter[1] tags many providers as FP8 and DeepInfra as FP4.

1. https://openrouter.ai/deepseek/deepseek-v4-pro

I presume the providers are the ones giving the info to OpenRouter? I mean, technically it is a mix of fp8 and fp4 (although it is predominately fp4), so I don't think either is inaccurate.