|
|
|
|
|
by christina97
2 hours ago
|
|
The Chinese models will not overtake the frontier US ones given the current way things are going. The US models derive their lead from incredible efforts to source more and higher quality (mostly synthetic data) via great feats (eg generating with humongous teacher models that could never feasibly serve interactive traffic). The Chinese models advance via heroic efforts to optimize models and great feats to secure more and higher quality training data from the US frontier models. For an (Chinese) open weight model to surpass the (US lab) frontier models, this equation must flip and the Chinese labs must entirely retool from harvesting frontier model data to producing the data systems and efforts to produce novel data; as well as procuring latest generation hardware en masse for this. This does not happen easily. Also training a frontier scale model is actually not such an unimaginable feat: doing all the inference with the teacher models is where the hardware goes. |
|
You don't know what's happening in z.ai nor alibaba. And you don't know what's happening in anthropic and open ai.
I don't know what they are all doing, but I find it extremely unlikely that they are not all collecting data from one another. I am confident anthropic has a team going over GML 5.2 weights even if it's just to see where the competition is.
Just because some labs are getting data from Anthropic does not mean they are not also doing their own research.
They were focused on optimization because they could not get the best hardware.The only reason their top labs are behind may be because they did not have h200s and MI350s. And now they do.
Plus you are discounting other risks, Anthropic is currently sitting on "the best" models in the world because they got in a pissing match with the US administration.
btw: This could be the case in china as well, their administration has been surprisingly open on AI exports and open weight models, that we know of. There is a very small but not trivial chance they are hogging a better version of glm 5.2 for example, but no one is allowed to talk about it. Now I am not saying that is the case, I am saying the two cases (chinese labs are 6 months behind, they are forced to suppress their best models) are indistinguishable.