| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by theanonymousone 7 days ago
	In OpenRouter, there is an "int4" tag for Moonshot provider of Kimi K2. 7 Code. Isn't that too low, particularly coming from the very developer of the model? Os that a mistake? How is it in their direct API offer?

1 comments

kouteiheika 7 days ago

The model is natively quantized (i.e. it was trained that way in the first place, so this is not a post-training quantization which degrades performance).

link

knollimar 7 days ago

Isn't it not completely quantized? I thought there were some dense parts but most is int4?

link

wgd 7 days ago

Often in MoE models the experts are quantized while the shared portions, being a much smaller part of the network with greater impact, are kept at higher or full precision. Not familiar with the Kimi QAT approach specifically but it's likely they do this.

link

theanonymousone 7 days ago

But the huggingface link mentions BF16, F16, and I32?

link

kouteiheika 7 days ago

Not every weight is quantized. For example, those weights which don't take much space or are highly important are left in higher precision. State-of-art quantization of weights is never done uniformly (i.e. to all weights and in the same way).

link

zackangelo 7 days ago

I don't believe safetensors has a native int4 dtype, so they packed 4 int4s into a bf16 in this checkpoint.

link