| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by christina97 2 hours ago
	The Chinese models will not overtake the frontier US ones given the current way things are going. The US models derive their lead from incredible efforts to source more and higher quality (mostly synthetic data) via great feats (eg generating with humongous teacher models that could never feasibly serve interactive traffic). The Chinese models advance via heroic efforts to optimize models and great feats to secure more and higher quality training data from the US frontier models. For an (Chinese) open weight model to surpass the (US lab) frontier models, this equation must flip and the Chinese labs must entirely retool from harvesting frontier model data to producing the data systems and efforts to produce novel data; as well as procuring latest generation hardware en masse for this. This does not happen easily. Also training a frontier scale model is actually not such an unimaginable feat: doing all the inference with the teacher models is where the hardware goes.

8 comments

throwawayffffas 2 hours ago

Unless you are working at one of these companies you don't know what they are doing.

You don't know what's happening in z.ai nor alibaba. And you don't know what's happening in anthropic and open ai.

I don't know what they are all doing, but I find it extremely unlikely that they are not all collecting data from one another. I am confident anthropic has a team going over GML 5.2 weights even if it's just to see where the competition is.

Just because some labs are getting data from Anthropic does not mean they are not also doing their own research.

They were focused on optimization because they could not get the best hardware.The only reason their top labs are behind may be because they did not have h200s and MI350s. And now they do.

Plus you are discounting other risks, Anthropic is currently sitting on "the best" models in the world because they got in a pissing match with the US administration.

btw: This could be the case in china as well, their administration has been surprisingly open on AI exports and open weight models, that we know of. There is a very small but not trivial chance they are hogging a better version of glm 5.2 for example, but no one is allowed to talk about it. Now I am not saying that is the case, I am saying the two cases (chinese labs are 6 months behind, they are forced to suppress their best models) are indistinguishable.

link

andy99 2 hours ago

> Chinese labs must entirely retool from harvesting frontier model data to producing the data systems and efforts to produce novel data

Even if your characterization is accurate, they could do this tomorrow and are not so myopic that they wouldn’t have thought about it. I don’t see this as a barrier, and I see a lot of the same underestimation of Asia that’s been happening for 50 years. There’s not some innate American advantage to building LLMs, and personally I think whatever head start the US has is going to be squandered on delays from the export control “to dangerous for release” LARPing we’re seeing.

link

ant-kinesthetic 2 hours ago

Exactly. If they wanted to they could produce the same amount of data. Companies like Scale, Mercor, Surge exists for a reason, a reason that doesn't need to exist in China if they mandate Chinese enterprises to provide all their real world data (or have them work inside RL environments) to the model companies for post training. There is no real advantage that US companies have except a head start, and as Jensen said, a ton of the research advantage is skewed since a lot of the best researchers in the US are Chinese nationals. I do think the model is just one piece of the pie (not to echo Jensen too much), and hopefully we will always be able to serve these bigger frontier models in a much more efficient way as well as building out the application layer faster which actually makes them useful and/or more dangerous/powerful.

link

s1artibartfast 2 hours ago

Why would those have any impact on R&D speed? Most are funded and close to cash flow positive

link

yorwba 1 hour ago

The amount of data Anthropic has claimed was extracted for distillation is tiny in comparison to the entire internet, which is right there for the taking and holds most of the knowledge people expect models to have.

Distilling even with small amounts of data from a better model is still helpful, but not in the sense of transferring capabilities the raw internet-trained model doesn't have at all, but for identifying those capabilities that are compatible with the servile assistant persona and suppressing others that are undesirable (e.g. trolling). A primitive version of this were instruction-tuning datasets generated with ChatGPT, as used e.g. for Alpaca.

Without a clear target to emulate, competitors might have to rely more on human raters, but there are plenty of data labeling companies in China, so that's hardly a hurdle.

link

bradishungry 2 hours ago

“China can only copy the US” is a very short sighted and uninformed opinion. there is more coming out of china than just new ways to distill models

link

CuriouslyC 2 hours ago

Coding a case where it's possible to programmatically generate large amounts of data relatively cheaply. China could realistically surpass the US in coding while still being behind in many other areas.

link

kulahan 2 hours ago

How so? You'll soon have your choice of a very old OAI model or a new Chinese model, because the USG has no interest in letting you access the newest models without explicit permission.

link

nomel 2 hours ago

Their point is that the Chinese models will also me limited to the very old OAI models, unless things flip. as they said.

The use of US models for Chinese model training is part of the motivation of all of this.

link

kulahan 2 hours ago

Apologies - I was too quick in my response. I was speaking from a "how the users will perceive it" point of view. China's pretty good at the internet reputation thing.

link

elisbce 2 hours ago

Chinese frontier models don't need to catch up in every category. They just need to win in coding and that's exactly where they are going. The gap went from 12+ months to 1-2 months with the latest release of GLM 5.2 and coding is a task that you don't need heroic efforts to find rare and long-tail training data, you can just outsmart your competitor by optimizing algorithms and training recipes. This is something they can do at scale with the money and talent pool.

link

Octoth0rpe 2 hours ago

> They just need to win in coding and that's exactly where they are going.

They don't even need to 'win' in the sense of maxing the benchmark. They can be 20% worse/50% cheaper and many of us (and our managers who approve our token budgets) will be in.

Deepseek is 30x cheaper for input/75x cheaper for output than sonnet on openrouter, and it's not a whole lot worse for many things.

link

bijowo1676 1 hour ago

Anthropic/OpenAI's valuations are built on assumption of capturing most of the market and having the pricing power to jack up prices for tokens.

It is enough to kneecap their pricing power to trigger the valuation reset by an order of magnitude and humble them a bit.

Plus there are always infrastructure and hardware providers who want to keep their share of profits and will squeeze Anthropic's margins to deflate their valuation (nvidia, aws, RAM manufacturers, etc)

link

jmyeet 2 hours ago

Yeah, this is, to be perfectly blunt, cope, for several reasons:

1. It's unclear if there is a law of diminishing returns with ever-larger models. They're more expensive to run and for many applications, you'll probably find smaller models are sufficient;

2. There's an inbuilt market for local LLMs. This is an effective limit on how large models can get. Case law hasn't been established yet on, for example, if a law firm using ChatGPT breaks privilege. Specifically, chat logs may be discoverable. Medical applications have this issue too and I think you'll find that financial firms are going to be leery about this as well;

3. Better, larger models will bleed into smaller, open source models. The chat logs themselves are training data. There's a whole market in China for Claude tokens around this;

4. China has a national security interest in not being beholden to US tech giants when it comes to AI. China has a history of being able to commit to large-scale long-term projects and Anthropic just won't be able to compete with a national project by one of the world's superpowers, if it comes down to it;

5. Winning doesn't necessarily mean being the best. Often it's just being good enough;

6. As an example of a national project, China is busy replicating EUV because of the US ban on ASML and NVidia exporting their best stuff. I don't think many in the West are prepared for how rapid this will be. I'm reminded of the policy debate in 1945 when many in American policy and militarey circles thought the USSR would never catch up with atomic bomb or, if they did, it would take 20+ years. It took 4 years. For the hydrogen bomb, it took 1. The US hardware advantage is a lot more tenuous than many realize.

link