|
|
|
|
|
by hirako2000
97 days ago
|
|
They are fully aware, but are playing a different game, R&D isn't something you flip a parameter and you get what the efficient oriented pipelines do. Chinese models were built on constraints. As we know limitations lead to innovation. So the "Chinese" R&D invested in optimisations. Teacher models were already there so they likely built the best distillation processes, along with the best MoE. Actually they published many of these works. Nuance, sure.
Anthropic/OpenAI could revise their philosophy to adopt efficiency. But momentum can't be underestimated. Plus, dollar per optimisations is a different math altogether, it's not only about access to the latest Nvidia GPUs. At $400k the engineer pop a year, health coverage, pension contribution. Hardware efficiency doesn't weigh as much as making sure engineering focuses on.. the raw power factor, I suppose. |
|
Honestly, I wonder what you think closed LLM companies do R&D on if not optimizations. And the nature of research is that most ideas that sound good turn out duds, so they already need to have an established process for testing many ideas quickly. Now if somebody publishes a new idea they haven't tried yet, setting up an experiment to try it out is just a routine task... But they aren't going to tell anybody the results, just quietly integrate it if it works.