Hacker News new | ask | show | jobs
by obmelvin 512 days ago
Agreed. They accomplished a lot with distillation and optimization - but there's little reason to believe you don't also need foundational models to keep advancing. Otherwise won't they run into issues training on more synthetic data?

In a way this is something most companies have been doing with their smaller models, DeepSeek just supposedly* did it better.