Hacker News new | ask | show | jobs
by irthomasthomas 7 days ago
I don't understand, given all they say, why this would not be made available to everyone at once? Why the limited release? They should have no trouble scaling it if it runs on a single rack.
7 comments

Maybe they don't have enough racks. The news indicate that China isn't in a really good situation with GPUs, so probably they want to keep most of them for other stuff. Also because since the price is so cheap they probably want to use the other GPUs for stuff that has higher margins.
Because presumably then it won't be 1000 t/s for everyone anymore given hardware limitations?
The TileRT approach swaps throughput for latency, which also means less overall efficiency

Given the export restrictions this could mean they need to prioritise how to best use their limited hardware. But they could also be moving to Huawei GPUs like deepseek did and simply not have stable hardware or software for a large scale deployment yet.

This is just speculation based on the MXFP4 support on Huawei GPUs that is lacking on some nvidia GPUs.

Chinese companies are blocked from buying modern ASML lithography machines. The most modern scanner China is still allowed to buy is NXT:1980i from 2015.
It uses significantly more resources obviously. And/or they have to configure or reconfigure servers for it, which takes time, and doesn't make sense until they have proven the demand at the higher price point.
I wonder about this too. The other objections miss the point: if it's faster, and otherwise the same, and doesn't require different hardware, then why not just announce that the standard tier of MiMo-v.25-Pro is now ridiculously fast and raise the price? What does "limited high speed resources" mean if it runs on the same hardware as the rest of their pool?

I think the answer is that there's a tradeoff here where additional throughput for a single person can be achieved only by tying up more resources than a normal request would, even when you take into account the fact that the normal request takes longer to finish. I'm not an expert, but some of the optimizations they describe, particularly the parallel prediction stuff, sound like they could take up extra resources.

> and doesn't require different hardware

But it may well do. They mention TileRT in the announcement, so this speed comes from low level optimization for some specific GPU target.

With availability of SOTA western GPUs being scarce in China, they may well have a mishmash of different GPUs.

They specifically said it's stock hardware, but... yeah, maybe highly specific stock hardware.
Maybe they only have a finite number of racks ;-)