|
|
|
|
|
by popinman322
504 days ago
|
|
DeepSeek was built on the foundations of public research, a major part of which is the Llama family of models. Prior to Llama open weights LLMs were considerably less performant; without Llama we might not have gotten Mistral, Qwen, or DeepSeek. This isn't meant to diminish DeepSeek's contributions, however: they've been doing great work on mixture of experts models and really pushing the community forward on that front. And, obviously, they've achieved incredible performance. Llama models are also still best in class for specific tasks that require local data processing. They also maintain positions in the top 25 of the lmarena leaderboard (for what that's worth these days with suspected gaming of the platform), which places them in competition with some of the best models in the world. But, going back to my first point, Llama set the stage for almost all open weights models after. They spent millions on training runs whose artifacts will never see the light of day, testing theories that are too expensive for smaller players to contemplate exploring. Pegging Llama as mediocre, or a waste of money (as implied elsewhere), feels incredibly myopic. |
|
That's not to say their work is unimpressive or not worthy - as you say, they've facilitated much of the open-source ecosystem and have been an enabling factor for many - but it's more that that work has been in making it accessible, not necessarily pushing the frontier of what's actually possible, and DeepSeek has shown us what's possible when you do the latter.