|
|
|
|
|
by simonw
219 days ago
|
|
> There’s honestly not a reason they have to be 1T parameters and cost an insane amount to train and run on inference. Kimi K2 Thinking is rumored to have cost $4.6m to train - according to "a source familiar with the matter": https://www.cnbc.com/2025/11/06/alibaba-backed-moonshot-rele... I think the most interesting recent Chinese model may be MiniMax M2, which is just 200B parameters but benchmarks close to Sonnet 4, at least for coding. That's small enough to run well on ~$5,000 of hardware, as opposed to the 1T models which require vastly more expensive machines. |
|