|
|
|
|
|
by obmelvin
512 days ago
|
|
Agreed. They accomplished a lot with distillation and optimization - but there's little reason to believe you don't also need foundational models to keep advancing. Otherwise won't they run into issues training on more synthetic data? In a way this is something most companies have been doing with their smaller models, DeepSeek just supposedly* did it better. |
|