|
|
|
|
|
by anon373839
3 hours ago
|
|
> China's distillation labs This notion that Chinese labs are merely distilling frontier models is quite an unwarranted slur. Those labs have published WAY more useful research than US labs on RL techniques, novel model architectures, training pipelines, etc. They have also hit intelligence-per-parameter densities that US labs have yet to attain. Apart from that, merely training a model on outputs from another model, off policy and without the logits, doesn’t really work that well. The Chinese labs know how to build frontier level models. GLM-5.2 shows that they no longer even need Nvidia chips to do it. |
|
Chinese labs are basically just telling everyone, out in the open, what they're doing and how to do it, and the answer from American frontier labs is "Well, they couldn't possibly be getting the results they're getting without just distilling our models," and the American labs aren't even trying to do some of the stuff like DS's aggressive caching to get costs down.